spectrum location intelligence - pitney bowes · 2020-01-28 · location intelligence for big data...

88
Spectrum Location Intelligence for Big Data Version 4.0 Spectrum Location Intelligence for Big Data User Guide

Upload: others

Post on 26-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spectrumtrade Location Intelligence for Big Data Version 40

Spectrumtrade Location Intelligence for Big Data User Guide

Table of Contents

1 - Welcome

What is Spectrumtrade Location Intelligence for Big

Data 4

Spectrumtrade Location Intelligence for Big Data

Architecture 5

System Requirements and Dependencies 6

2 - Spatial

Installing the SDK 8

Hive User-Defined Spatial Functions 9

Spark 72

3 - Appendix

PGD Builder 81

Download Permissions 83

Operators and Syntax Delimiters 84

1 - Welcome

In this section

What is Spectrumtrade Location Intelligence for Big Data 4 Spectrumtrade Location Intelligence for Big Data Architecture 5 System Requirements and Dependencies 6

Welcome

What is Spectrumtrade Location Intelligence for Big Data

The Pitney Bowes Spectrumtrade Location Intelligence for Big Data is a toolkit for processing enterprise data for large scale spatial analysis Billions of records can be processed in parallel using MapReduce Hive and Apache Sparks cluster processing framework yielding results faster than ever Unlike traditional processing techniques that used to take weeks to process the data now the data processing can be done in a few hours using this product

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 4

Welcome

Spectrumtrade Location Intelligence for Big Data Architecture

What is Spectrumtrade Location Intelligence for Big Data

The Spectrumtrade Location Intelligence for Big Data transforms and packages Location Intelligence components into an SDK for Big Data platforms like Hadoop for Spark MapReduce and Hive

SDK provides

bull Integration APIs for Location Intelligence bull Input datasets and metadata

API Types

bull Pre-built Spark and Hive UDF wrappers for Location Intelligence operations bull Core Location Intelligence APIs with sample MapReduce Hive and Spark programs (security enabled via Kerberos and Apache Sentry for Hive)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 5

Welcome

System Requirements and Dependencies

Spectrumtrade Location Intelligence for Big Data is a collection of jar files that can be deployed to your Hadoop system

This product is verified on the following Hadoop distributions

bull Cloudera 512 and 6x bull Hortonworks 3x bull EMR 510 and 520

To use these jar files you must be familiar with configuring Hadoop in Hortonworks Cloudera or EMR and developing applications for distributed processing For more information refer to Hortonworks Cloudera or EMR documentation

To use the product the following must be installed on your system

for Hive

bull Hive version 121 or above

for Hive Client

bull Beeline for example

for Spark and Zeppelin Notebook

bull Java JDK version 18 or above bull Hadoop version 260 or above bull Spark version 20 or above bull Zeppelin Notebook is not supported in Cloudera

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 6

2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files

Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching

Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive

In this section

Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72

Spatial

Installing the SDK

To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node

For the purposes of this guide we will

bull use a user called pbuser bull install everything into pb

Perform the following steps from a node in your cluster such as the master node

1 Create the install directory and give ownership to pbuser

sudo mkdir pbsudo chown pbuserpbuser pb

2 Add the Location Intelligence distribution zip to the node at a temporary location for example

pbtempspectrum-bigdata-locationintelligence-versionzip

3 Extract the Location Intelligence distribution

mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware

4 Create an install directory on hdfs and give ownership to pbuser

sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb

5 Upload the distribution into HDFS

hadoop fs -copyFromLocal pblisoftware hdfspbli

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8

Spatial

Hive User-Defined Spatial Functions

Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar

Refer to the table below to quickly navigate to Hive UDFs described in this document

Type Description Name

Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON

from supported geometry representation FromKML

formats

FromWKB

FromWKT

ST_Point

Grid Functions on page 53 Grid processing functions GeoHashBoundary

GeoHashID

HexagonBoundary

HexagonID

SquareHashBoundary

SquareHashID

Measurement Functions on page 29 Geometry measurement functions Area

ClosestPoints

Distance

Length

Perimeter

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9

Spatial

Type Description Name

Observer Functions on page 46 Geometry observer functions ST_X

ST_Y

ST_XMax

ST_XMin

ST_YMax

ST_YMin

Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON

to supported geometry representation ToKML

formats

ToWKB

ToWKT

Predicate Functions on page 24 Geometry predicate functions Disjoint

Intersects

Overlaps

Within

IsNullGeometry

Processing Functions on page 39 Geometry processing functions Buffer

ConvexHull

Intersection

Transform

Union

Search Functions on page 66 Spatial search functions LocalPointInPolygon

LocalSearchNearest

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10

Spatial

Setup

This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps

1 Proceed according to your platform

On this Do this platform

Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node

pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar

In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar

pblisoftwarehivelib

Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist

sudo mkdir usrhdpcurrenthive-server2auxlib

Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node

sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib

2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step

beeline -u jdbchive2localhost10000default -n pbuser

4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)

create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11

Spatial

create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter

create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform

create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin

create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID

create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution

bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property

bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12

Spatial

WritableGeometry

This is an implementation of Hadoops Writable interface for geometry

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example

To get an instance of WritableGeometry from WKT

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

To get an instance of WritableGeometry from WKB string

SELECT FromWKB(tgeometryepsg4267) FROM hivetable t

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example

To serialize an instance of WritableGeometry to WKT

SELECT ToWKT(tgeometry) FROM hivetable t

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example

To calculate the length of a geometry

SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t

To get the distance between two geometries

SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 2: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Table of Contents

1 - Welcome

What is Spectrumtrade Location Intelligence for Big

Data 4

Spectrumtrade Location Intelligence for Big Data

Architecture 5

System Requirements and Dependencies 6

2 - Spatial

Installing the SDK 8

Hive User-Defined Spatial Functions 9

Spark 72

3 - Appendix

PGD Builder 81

Download Permissions 83

Operators and Syntax Delimiters 84

1 - Welcome

In this section

What is Spectrumtrade Location Intelligence for Big Data 4 Spectrumtrade Location Intelligence for Big Data Architecture 5 System Requirements and Dependencies 6

Welcome

What is Spectrumtrade Location Intelligence for Big Data

The Pitney Bowes Spectrumtrade Location Intelligence for Big Data is a toolkit for processing enterprise data for large scale spatial analysis Billions of records can be processed in parallel using MapReduce Hive and Apache Sparks cluster processing framework yielding results faster than ever Unlike traditional processing techniques that used to take weeks to process the data now the data processing can be done in a few hours using this product

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 4

Welcome

Spectrumtrade Location Intelligence for Big Data Architecture

What is Spectrumtrade Location Intelligence for Big Data

The Spectrumtrade Location Intelligence for Big Data transforms and packages Location Intelligence components into an SDK for Big Data platforms like Hadoop for Spark MapReduce and Hive

SDK provides

bull Integration APIs for Location Intelligence bull Input datasets and metadata

API Types

bull Pre-built Spark and Hive UDF wrappers for Location Intelligence operations bull Core Location Intelligence APIs with sample MapReduce Hive and Spark programs (security enabled via Kerberos and Apache Sentry for Hive)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 5

Welcome

System Requirements and Dependencies

Spectrumtrade Location Intelligence for Big Data is a collection of jar files that can be deployed to your Hadoop system

This product is verified on the following Hadoop distributions

bull Cloudera 512 and 6x bull Hortonworks 3x bull EMR 510 and 520

To use these jar files you must be familiar with configuring Hadoop in Hortonworks Cloudera or EMR and developing applications for distributed processing For more information refer to Hortonworks Cloudera or EMR documentation

To use the product the following must be installed on your system

for Hive

bull Hive version 121 or above

for Hive Client

bull Beeline for example

for Spark and Zeppelin Notebook

bull Java JDK version 18 or above bull Hadoop version 260 or above bull Spark version 20 or above bull Zeppelin Notebook is not supported in Cloudera

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 6

2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files

Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching

Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive

In this section

Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72

Spatial

Installing the SDK

To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node

For the purposes of this guide we will

bull use a user called pbuser bull install everything into pb

Perform the following steps from a node in your cluster such as the master node

1 Create the install directory and give ownership to pbuser

sudo mkdir pbsudo chown pbuserpbuser pb

2 Add the Location Intelligence distribution zip to the node at a temporary location for example

pbtempspectrum-bigdata-locationintelligence-versionzip

3 Extract the Location Intelligence distribution

mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware

4 Create an install directory on hdfs and give ownership to pbuser

sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb

5 Upload the distribution into HDFS

hadoop fs -copyFromLocal pblisoftware hdfspbli

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8

Spatial

Hive User-Defined Spatial Functions

Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar

Refer to the table below to quickly navigate to Hive UDFs described in this document

Type Description Name

Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON

from supported geometry representation FromKML

formats

FromWKB

FromWKT

ST_Point

Grid Functions on page 53 Grid processing functions GeoHashBoundary

GeoHashID

HexagonBoundary

HexagonID

SquareHashBoundary

SquareHashID

Measurement Functions on page 29 Geometry measurement functions Area

ClosestPoints

Distance

Length

Perimeter

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9

Spatial

Type Description Name

Observer Functions on page 46 Geometry observer functions ST_X

ST_Y

ST_XMax

ST_XMin

ST_YMax

ST_YMin

Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON

to supported geometry representation ToKML

formats

ToWKB

ToWKT

Predicate Functions on page 24 Geometry predicate functions Disjoint

Intersects

Overlaps

Within

IsNullGeometry

Processing Functions on page 39 Geometry processing functions Buffer

ConvexHull

Intersection

Transform

Union

Search Functions on page 66 Spatial search functions LocalPointInPolygon

LocalSearchNearest

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10

Spatial

Setup

This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps

1 Proceed according to your platform

On this Do this platform

Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node

pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar

In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar

pblisoftwarehivelib

Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist

sudo mkdir usrhdpcurrenthive-server2auxlib

Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node

sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib

2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step

beeline -u jdbchive2localhost10000default -n pbuser

4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)

create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11

Spatial

create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter

create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform

create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin

create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID

create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution

bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property

bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12

Spatial

WritableGeometry

This is an implementation of Hadoops Writable interface for geometry

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example

To get an instance of WritableGeometry from WKT

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

To get an instance of WritableGeometry from WKB string

SELECT FromWKB(tgeometryepsg4267) FROM hivetable t

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example

To serialize an instance of WritableGeometry to WKT

SELECT ToWKT(tgeometry) FROM hivetable t

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example

To calculate the length of a geometry

SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t

To get the distance between two geometries

SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 3: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

1 - Welcome

In this section

What is Spectrumtrade Location Intelligence for Big Data 4 Spectrumtrade Location Intelligence for Big Data Architecture 5 System Requirements and Dependencies 6

Welcome

What is Spectrumtrade Location Intelligence for Big Data

The Pitney Bowes Spectrumtrade Location Intelligence for Big Data is a toolkit for processing enterprise data for large scale spatial analysis Billions of records can be processed in parallel using MapReduce Hive and Apache Sparks cluster processing framework yielding results faster than ever Unlike traditional processing techniques that used to take weeks to process the data now the data processing can be done in a few hours using this product

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 4

Welcome

Spectrumtrade Location Intelligence for Big Data Architecture

What is Spectrumtrade Location Intelligence for Big Data

The Spectrumtrade Location Intelligence for Big Data transforms and packages Location Intelligence components into an SDK for Big Data platforms like Hadoop for Spark MapReduce and Hive

SDK provides

bull Integration APIs for Location Intelligence bull Input datasets and metadata

API Types

bull Pre-built Spark and Hive UDF wrappers for Location Intelligence operations bull Core Location Intelligence APIs with sample MapReduce Hive and Spark programs (security enabled via Kerberos and Apache Sentry for Hive)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 5

Welcome

System Requirements and Dependencies

Spectrumtrade Location Intelligence for Big Data is a collection of jar files that can be deployed to your Hadoop system

This product is verified on the following Hadoop distributions

bull Cloudera 512 and 6x bull Hortonworks 3x bull EMR 510 and 520

To use these jar files you must be familiar with configuring Hadoop in Hortonworks Cloudera or EMR and developing applications for distributed processing For more information refer to Hortonworks Cloudera or EMR documentation

To use the product the following must be installed on your system

for Hive

bull Hive version 121 or above

for Hive Client

bull Beeline for example

for Spark and Zeppelin Notebook

bull Java JDK version 18 or above bull Hadoop version 260 or above bull Spark version 20 or above bull Zeppelin Notebook is not supported in Cloudera

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 6

2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files

Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching

Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive

In this section

Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72

Spatial

Installing the SDK

To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node

For the purposes of this guide we will

bull use a user called pbuser bull install everything into pb

Perform the following steps from a node in your cluster such as the master node

1 Create the install directory and give ownership to pbuser

sudo mkdir pbsudo chown pbuserpbuser pb

2 Add the Location Intelligence distribution zip to the node at a temporary location for example

pbtempspectrum-bigdata-locationintelligence-versionzip

3 Extract the Location Intelligence distribution

mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware

4 Create an install directory on hdfs and give ownership to pbuser

sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb

5 Upload the distribution into HDFS

hadoop fs -copyFromLocal pblisoftware hdfspbli

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8

Spatial

Hive User-Defined Spatial Functions

Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar

Refer to the table below to quickly navigate to Hive UDFs described in this document

Type Description Name

Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON

from supported geometry representation FromKML

formats

FromWKB

FromWKT

ST_Point

Grid Functions on page 53 Grid processing functions GeoHashBoundary

GeoHashID

HexagonBoundary

HexagonID

SquareHashBoundary

SquareHashID

Measurement Functions on page 29 Geometry measurement functions Area

ClosestPoints

Distance

Length

Perimeter

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9

Spatial

Type Description Name

Observer Functions on page 46 Geometry observer functions ST_X

ST_Y

ST_XMax

ST_XMin

ST_YMax

ST_YMin

Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON

to supported geometry representation ToKML

formats

ToWKB

ToWKT

Predicate Functions on page 24 Geometry predicate functions Disjoint

Intersects

Overlaps

Within

IsNullGeometry

Processing Functions on page 39 Geometry processing functions Buffer

ConvexHull

Intersection

Transform

Union

Search Functions on page 66 Spatial search functions LocalPointInPolygon

LocalSearchNearest

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10

Spatial

Setup

This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps

1 Proceed according to your platform

On this Do this platform

Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node

pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar

In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar

pblisoftwarehivelib

Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist

sudo mkdir usrhdpcurrenthive-server2auxlib

Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node

sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib

2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step

beeline -u jdbchive2localhost10000default -n pbuser

4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)

create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11

Spatial

create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter

create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform

create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin

create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID

create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution

bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property

bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12

Spatial

WritableGeometry

This is an implementation of Hadoops Writable interface for geometry

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example

To get an instance of WritableGeometry from WKT

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

To get an instance of WritableGeometry from WKB string

SELECT FromWKB(tgeometryepsg4267) FROM hivetable t

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example

To serialize an instance of WritableGeometry to WKT

SELECT ToWKT(tgeometry) FROM hivetable t

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example

To calculate the length of a geometry

SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t

To get the distance between two geometries

SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 4: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Welcome

What is Spectrumtrade Location Intelligence for Big Data

The Pitney Bowes Spectrumtrade Location Intelligence for Big Data is a toolkit for processing enterprise data for large scale spatial analysis Billions of records can be processed in parallel using MapReduce Hive and Apache Sparks cluster processing framework yielding results faster than ever Unlike traditional processing techniques that used to take weeks to process the data now the data processing can be done in a few hours using this product

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 4

Welcome

Spectrumtrade Location Intelligence for Big Data Architecture

What is Spectrumtrade Location Intelligence for Big Data

The Spectrumtrade Location Intelligence for Big Data transforms and packages Location Intelligence components into an SDK for Big Data platforms like Hadoop for Spark MapReduce and Hive

SDK provides

bull Integration APIs for Location Intelligence bull Input datasets and metadata

API Types

bull Pre-built Spark and Hive UDF wrappers for Location Intelligence operations bull Core Location Intelligence APIs with sample MapReduce Hive and Spark programs (security enabled via Kerberos and Apache Sentry for Hive)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 5

Welcome

System Requirements and Dependencies

Spectrumtrade Location Intelligence for Big Data is a collection of jar files that can be deployed to your Hadoop system

This product is verified on the following Hadoop distributions

bull Cloudera 512 and 6x bull Hortonworks 3x bull EMR 510 and 520

To use these jar files you must be familiar with configuring Hadoop in Hortonworks Cloudera or EMR and developing applications for distributed processing For more information refer to Hortonworks Cloudera or EMR documentation

To use the product the following must be installed on your system

for Hive

bull Hive version 121 or above

for Hive Client

bull Beeline for example

for Spark and Zeppelin Notebook

bull Java JDK version 18 or above bull Hadoop version 260 or above bull Spark version 20 or above bull Zeppelin Notebook is not supported in Cloudera

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 6

2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files

Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching

Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive

In this section

Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72

Spatial

Installing the SDK

To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node

For the purposes of this guide we will

bull use a user called pbuser bull install everything into pb

Perform the following steps from a node in your cluster such as the master node

1 Create the install directory and give ownership to pbuser

sudo mkdir pbsudo chown pbuserpbuser pb

2 Add the Location Intelligence distribution zip to the node at a temporary location for example

pbtempspectrum-bigdata-locationintelligence-versionzip

3 Extract the Location Intelligence distribution

mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware

4 Create an install directory on hdfs and give ownership to pbuser

sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb

5 Upload the distribution into HDFS

hadoop fs -copyFromLocal pblisoftware hdfspbli

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8

Spatial

Hive User-Defined Spatial Functions

Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar

Refer to the table below to quickly navigate to Hive UDFs described in this document

Type Description Name

Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON

from supported geometry representation FromKML

formats

FromWKB

FromWKT

ST_Point

Grid Functions on page 53 Grid processing functions GeoHashBoundary

GeoHashID

HexagonBoundary

HexagonID

SquareHashBoundary

SquareHashID

Measurement Functions on page 29 Geometry measurement functions Area

ClosestPoints

Distance

Length

Perimeter

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9

Spatial

Type Description Name

Observer Functions on page 46 Geometry observer functions ST_X

ST_Y

ST_XMax

ST_XMin

ST_YMax

ST_YMin

Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON

to supported geometry representation ToKML

formats

ToWKB

ToWKT

Predicate Functions on page 24 Geometry predicate functions Disjoint

Intersects

Overlaps

Within

IsNullGeometry

Processing Functions on page 39 Geometry processing functions Buffer

ConvexHull

Intersection

Transform

Union

Search Functions on page 66 Spatial search functions LocalPointInPolygon

LocalSearchNearest

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10

Spatial

Setup

This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps

1 Proceed according to your platform

On this Do this platform

Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node

pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar

In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar

pblisoftwarehivelib

Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist

sudo mkdir usrhdpcurrenthive-server2auxlib

Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node

sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib

2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step

beeline -u jdbchive2localhost10000default -n pbuser

4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)

create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11

Spatial

create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter

create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform

create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin

create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID

create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution

bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property

bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12

Spatial

WritableGeometry

This is an implementation of Hadoops Writable interface for geometry

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example

To get an instance of WritableGeometry from WKT

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

To get an instance of WritableGeometry from WKB string

SELECT FromWKB(tgeometryepsg4267) FROM hivetable t

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example

To serialize an instance of WritableGeometry to WKT

SELECT ToWKT(tgeometry) FROM hivetable t

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example

To calculate the length of a geometry

SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t

To get the distance between two geometries

SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 5: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Welcome

Spectrumtrade Location Intelligence for Big Data Architecture

What is Spectrumtrade Location Intelligence for Big Data

The Spectrumtrade Location Intelligence for Big Data transforms and packages Location Intelligence components into an SDK for Big Data platforms like Hadoop for Spark MapReduce and Hive

SDK provides

bull Integration APIs for Location Intelligence bull Input datasets and metadata

API Types

bull Pre-built Spark and Hive UDF wrappers for Location Intelligence operations bull Core Location Intelligence APIs with sample MapReduce Hive and Spark programs (security enabled via Kerberos and Apache Sentry for Hive)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 5

Welcome

System Requirements and Dependencies

Spectrumtrade Location Intelligence for Big Data is a collection of jar files that can be deployed to your Hadoop system

This product is verified on the following Hadoop distributions

bull Cloudera 512 and 6x bull Hortonworks 3x bull EMR 510 and 520

To use these jar files you must be familiar with configuring Hadoop in Hortonworks Cloudera or EMR and developing applications for distributed processing For more information refer to Hortonworks Cloudera or EMR documentation

To use the product the following must be installed on your system

for Hive

bull Hive version 121 or above

for Hive Client

bull Beeline for example

for Spark and Zeppelin Notebook

bull Java JDK version 18 or above bull Hadoop version 260 or above bull Spark version 20 or above bull Zeppelin Notebook is not supported in Cloudera

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 6

2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files

Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching

Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive

In this section

Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72

Spatial

Installing the SDK

To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node

For the purposes of this guide we will

bull use a user called pbuser bull install everything into pb

Perform the following steps from a node in your cluster such as the master node

1 Create the install directory and give ownership to pbuser

sudo mkdir pbsudo chown pbuserpbuser pb

2 Add the Location Intelligence distribution zip to the node at a temporary location for example

pbtempspectrum-bigdata-locationintelligence-versionzip

3 Extract the Location Intelligence distribution

mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware

4 Create an install directory on hdfs and give ownership to pbuser

sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb

5 Upload the distribution into HDFS

hadoop fs -copyFromLocal pblisoftware hdfspbli

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8

Spatial

Hive User-Defined Spatial Functions

Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar

Refer to the table below to quickly navigate to Hive UDFs described in this document

Type Description Name

Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON

from supported geometry representation FromKML

formats

FromWKB

FromWKT

ST_Point

Grid Functions on page 53 Grid processing functions GeoHashBoundary

GeoHashID

HexagonBoundary

HexagonID

SquareHashBoundary

SquareHashID

Measurement Functions on page 29 Geometry measurement functions Area

ClosestPoints

Distance

Length

Perimeter

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9

Spatial

Type Description Name

Observer Functions on page 46 Geometry observer functions ST_X

ST_Y

ST_XMax

ST_XMin

ST_YMax

ST_YMin

Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON

to supported geometry representation ToKML

formats

ToWKB

ToWKT

Predicate Functions on page 24 Geometry predicate functions Disjoint

Intersects

Overlaps

Within

IsNullGeometry

Processing Functions on page 39 Geometry processing functions Buffer

ConvexHull

Intersection

Transform

Union

Search Functions on page 66 Spatial search functions LocalPointInPolygon

LocalSearchNearest

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10

Spatial

Setup

This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps

1 Proceed according to your platform

On this Do this platform

Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node

pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar

In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar

pblisoftwarehivelib

Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist

sudo mkdir usrhdpcurrenthive-server2auxlib

Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node

sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib

2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step

beeline -u jdbchive2localhost10000default -n pbuser

4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)

create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11

Spatial

create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter

create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform

create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin

create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID

create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution

bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property

bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12

Spatial

WritableGeometry

This is an implementation of Hadoops Writable interface for geometry

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example

To get an instance of WritableGeometry from WKT

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

To get an instance of WritableGeometry from WKB string

SELECT FromWKB(tgeometryepsg4267) FROM hivetable t

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example

To serialize an instance of WritableGeometry to WKT

SELECT ToWKT(tgeometry) FROM hivetable t

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example

To calculate the length of a geometry

SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t

To get the distance between two geometries

SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 6: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Welcome

System Requirements and Dependencies

Spectrumtrade Location Intelligence for Big Data is a collection of jar files that can be deployed to your Hadoop system

This product is verified on the following Hadoop distributions

bull Cloudera 512 and 6x bull Hortonworks 3x bull EMR 510 and 520

To use these jar files you must be familiar with configuring Hadoop in Hortonworks Cloudera or EMR and developing applications for distributed processing For more information refer to Hortonworks Cloudera or EMR documentation

To use the product the following must be installed on your system

for Hive

bull Hive version 121 or above

for Hive Client

bull Beeline for example

for Spark and Zeppelin Notebook

bull Java JDK version 18 or above bull Hadoop version 260 or above bull Spark version 20 or above bull Zeppelin Notebook is not supported in Cloudera

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 6

2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files

Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching

Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive

In this section

Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72

Spatial

Installing the SDK

To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node

For the purposes of this guide we will

bull use a user called pbuser bull install everything into pb

Perform the following steps from a node in your cluster such as the master node

1 Create the install directory and give ownership to pbuser

sudo mkdir pbsudo chown pbuserpbuser pb

2 Add the Location Intelligence distribution zip to the node at a temporary location for example

pbtempspectrum-bigdata-locationintelligence-versionzip

3 Extract the Location Intelligence distribution

mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware

4 Create an install directory on hdfs and give ownership to pbuser

sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb

5 Upload the distribution into HDFS

hadoop fs -copyFromLocal pblisoftware hdfspbli

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8

Spatial

Hive User-Defined Spatial Functions

Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar

Refer to the table below to quickly navigate to Hive UDFs described in this document

Type Description Name

Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON

from supported geometry representation FromKML

formats

FromWKB

FromWKT

ST_Point

Grid Functions on page 53 Grid processing functions GeoHashBoundary

GeoHashID

HexagonBoundary

HexagonID

SquareHashBoundary

SquareHashID

Measurement Functions on page 29 Geometry measurement functions Area

ClosestPoints

Distance

Length

Perimeter

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9

Spatial

Type Description Name

Observer Functions on page 46 Geometry observer functions ST_X

ST_Y

ST_XMax

ST_XMin

ST_YMax

ST_YMin

Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON

to supported geometry representation ToKML

formats

ToWKB

ToWKT

Predicate Functions on page 24 Geometry predicate functions Disjoint

Intersects

Overlaps

Within

IsNullGeometry

Processing Functions on page 39 Geometry processing functions Buffer

ConvexHull

Intersection

Transform

Union

Search Functions on page 66 Spatial search functions LocalPointInPolygon

LocalSearchNearest

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10

Spatial

Setup

This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps

1 Proceed according to your platform

On this Do this platform

Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node

pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar

In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar

pblisoftwarehivelib

Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist

sudo mkdir usrhdpcurrenthive-server2auxlib

Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node

sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib

2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step

beeline -u jdbchive2localhost10000default -n pbuser

4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)

create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11

Spatial

create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter

create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform

create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin

create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID

create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution

bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property

bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12

Spatial

WritableGeometry

This is an implementation of Hadoops Writable interface for geometry

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example

To get an instance of WritableGeometry from WKT

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

To get an instance of WritableGeometry from WKB string

SELECT FromWKB(tgeometryepsg4267) FROM hivetable t

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example

To serialize an instance of WritableGeometry to WKT

SELECT ToWKT(tgeometry) FROM hivetable t

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example

To calculate the length of a geometry

SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t

To get the distance between two geometries

SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 7: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files

Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching

Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive

In this section

Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72

Spatial

Installing the SDK

To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node

For the purposes of this guide we will

bull use a user called pbuser bull install everything into pb

Perform the following steps from a node in your cluster such as the master node

1 Create the install directory and give ownership to pbuser

sudo mkdir pbsudo chown pbuserpbuser pb

2 Add the Location Intelligence distribution zip to the node at a temporary location for example

pbtempspectrum-bigdata-locationintelligence-versionzip

3 Extract the Location Intelligence distribution

mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware

4 Create an install directory on hdfs and give ownership to pbuser

sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb

5 Upload the distribution into HDFS

hadoop fs -copyFromLocal pblisoftware hdfspbli

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8

Spatial

Hive User-Defined Spatial Functions

Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar

Refer to the table below to quickly navigate to Hive UDFs described in this document

Type Description Name

Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON

from supported geometry representation FromKML

formats

FromWKB

FromWKT

ST_Point

Grid Functions on page 53 Grid processing functions GeoHashBoundary

GeoHashID

HexagonBoundary

HexagonID

SquareHashBoundary

SquareHashID

Measurement Functions on page 29 Geometry measurement functions Area

ClosestPoints

Distance

Length

Perimeter

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9

Spatial

Type Description Name

Observer Functions on page 46 Geometry observer functions ST_X

ST_Y

ST_XMax

ST_XMin

ST_YMax

ST_YMin

Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON

to supported geometry representation ToKML

formats

ToWKB

ToWKT

Predicate Functions on page 24 Geometry predicate functions Disjoint

Intersects

Overlaps

Within

IsNullGeometry

Processing Functions on page 39 Geometry processing functions Buffer

ConvexHull

Intersection

Transform

Union

Search Functions on page 66 Spatial search functions LocalPointInPolygon

LocalSearchNearest

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10

Spatial

Setup

This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps

1 Proceed according to your platform

On this Do this platform

Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node

pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar

In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar

pblisoftwarehivelib

Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist

sudo mkdir usrhdpcurrenthive-server2auxlib

Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node

sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib

2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step

beeline -u jdbchive2localhost10000default -n pbuser

4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)

create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11

Spatial

create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter

create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform

create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin

create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID

create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution

bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property

bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12

Spatial

WritableGeometry

This is an implementation of Hadoops Writable interface for geometry

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example

To get an instance of WritableGeometry from WKT

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

To get an instance of WritableGeometry from WKB string

SELECT FromWKB(tgeometryepsg4267) FROM hivetable t

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example

To serialize an instance of WritableGeometry to WKT

SELECT ToWKT(tgeometry) FROM hivetable t

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example

To calculate the length of a geometry

SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t

To get the distance between two geometries

SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 8: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Installing the SDK

To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node

For the purposes of this guide we will

bull use a user called pbuser bull install everything into pb

Perform the following steps from a node in your cluster such as the master node

1 Create the install directory and give ownership to pbuser

sudo mkdir pbsudo chown pbuserpbuser pb

2 Add the Location Intelligence distribution zip to the node at a temporary location for example

pbtempspectrum-bigdata-locationintelligence-versionzip

3 Extract the Location Intelligence distribution

mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware

4 Create an install directory on hdfs and give ownership to pbuser

sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb

5 Upload the distribution into HDFS

hadoop fs -copyFromLocal pblisoftware hdfspbli

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8

Spatial

Hive User-Defined Spatial Functions

Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar

Refer to the table below to quickly navigate to Hive UDFs described in this document

Type Description Name

Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON

from supported geometry representation FromKML

formats

FromWKB

FromWKT

ST_Point

Grid Functions on page 53 Grid processing functions GeoHashBoundary

GeoHashID

HexagonBoundary

HexagonID

SquareHashBoundary

SquareHashID

Measurement Functions on page 29 Geometry measurement functions Area

ClosestPoints

Distance

Length

Perimeter

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9

Spatial

Type Description Name

Observer Functions on page 46 Geometry observer functions ST_X

ST_Y

ST_XMax

ST_XMin

ST_YMax

ST_YMin

Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON

to supported geometry representation ToKML

formats

ToWKB

ToWKT

Predicate Functions on page 24 Geometry predicate functions Disjoint

Intersects

Overlaps

Within

IsNullGeometry

Processing Functions on page 39 Geometry processing functions Buffer

ConvexHull

Intersection

Transform

Union

Search Functions on page 66 Spatial search functions LocalPointInPolygon

LocalSearchNearest

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10

Spatial

Setup

This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps

1 Proceed according to your platform

On this Do this platform

Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node

pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar

In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar

pblisoftwarehivelib

Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist

sudo mkdir usrhdpcurrenthive-server2auxlib

Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node

sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib

2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step

beeline -u jdbchive2localhost10000default -n pbuser

4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)

create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11

Spatial

create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter

create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform

create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin

create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID

create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution

bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property

bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12

Spatial

WritableGeometry

This is an implementation of Hadoops Writable interface for geometry

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example

To get an instance of WritableGeometry from WKT

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

To get an instance of WritableGeometry from WKB string

SELECT FromWKB(tgeometryepsg4267) FROM hivetable t

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example

To serialize an instance of WritableGeometry to WKT

SELECT ToWKT(tgeometry) FROM hivetable t

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example

To calculate the length of a geometry

SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t

To get the distance between two geometries

SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 9: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Hive User-Defined Spatial Functions

Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar

Refer to the table below to quickly navigate to Hive UDFs described in this document

Type Description Name

Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON

from supported geometry representation FromKML

formats

FromWKB

FromWKT

ST_Point

Grid Functions on page 53 Grid processing functions GeoHashBoundary

GeoHashID

HexagonBoundary

HexagonID

SquareHashBoundary

SquareHashID

Measurement Functions on page 29 Geometry measurement functions Area

ClosestPoints

Distance

Length

Perimeter

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9

Spatial

Type Description Name

Observer Functions on page 46 Geometry observer functions ST_X

ST_Y

ST_XMax

ST_XMin

ST_YMax

ST_YMin

Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON

to supported geometry representation ToKML

formats

ToWKB

ToWKT

Predicate Functions on page 24 Geometry predicate functions Disjoint

Intersects

Overlaps

Within

IsNullGeometry

Processing Functions on page 39 Geometry processing functions Buffer

ConvexHull

Intersection

Transform

Union

Search Functions on page 66 Spatial search functions LocalPointInPolygon

LocalSearchNearest

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10

Spatial

Setup

This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps

1 Proceed according to your platform

On this Do this platform

Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node

pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar

In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar

pblisoftwarehivelib

Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist

sudo mkdir usrhdpcurrenthive-server2auxlib

Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node

sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib

2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step

beeline -u jdbchive2localhost10000default -n pbuser

4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)

create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11

Spatial

create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter

create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform

create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin

create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID

create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution

bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property

bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12

Spatial

WritableGeometry

This is an implementation of Hadoops Writable interface for geometry

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example

To get an instance of WritableGeometry from WKT

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

To get an instance of WritableGeometry from WKB string

SELECT FromWKB(tgeometryepsg4267) FROM hivetable t

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example

To serialize an instance of WritableGeometry to WKT

SELECT ToWKT(tgeometry) FROM hivetable t

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example

To calculate the length of a geometry

SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t

To get the distance between two geometries

SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 10: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Type Description Name

Observer Functions on page 46 Geometry observer functions ST_X

ST_Y

ST_XMax

ST_XMin

ST_YMax

ST_YMin

Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON

to supported geometry representation ToKML

formats

ToWKB

ToWKT

Predicate Functions on page 24 Geometry predicate functions Disjoint

Intersects

Overlaps

Within

IsNullGeometry

Processing Functions on page 39 Geometry processing functions Buffer

ConvexHull

Intersection

Transform

Union

Search Functions on page 66 Spatial search functions LocalPointInPolygon

LocalSearchNearest

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10

Spatial

Setup

This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps

1 Proceed according to your platform

On this Do this platform

Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node

pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar

In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar

pblisoftwarehivelib

Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist

sudo mkdir usrhdpcurrenthive-server2auxlib

Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node

sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib

2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step

beeline -u jdbchive2localhost10000default -n pbuser

4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)

create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11

Spatial

create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter

create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform

create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin

create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID

create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution

bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property

bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12

Spatial

WritableGeometry

This is an implementation of Hadoops Writable interface for geometry

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example

To get an instance of WritableGeometry from WKT

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

To get an instance of WritableGeometry from WKB string

SELECT FromWKB(tgeometryepsg4267) FROM hivetable t

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example

To serialize an instance of WritableGeometry to WKT

SELECT ToWKT(tgeometry) FROM hivetable t

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example

To calculate the length of a geometry

SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t

To get the distance between two geometries

SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 11: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Setup

This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps

1 Proceed according to your platform

On this Do this platform

Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node

pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar

In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar

pblisoftwarehivelib

Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist

sudo mkdir usrhdpcurrenthive-server2auxlib

Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node

sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib

2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step

beeline -u jdbchive2localhost10000default -n pbuser

4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)

create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11

Spatial

create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter

create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform

create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin

create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID

create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution

bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property

bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12

Spatial

WritableGeometry

This is an implementation of Hadoops Writable interface for geometry

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example

To get an instance of WritableGeometry from WKT

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

To get an instance of WritableGeometry from WKB string

SELECT FromWKB(tgeometryepsg4267) FROM hivetable t

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example

To serialize an instance of WritableGeometry to WKT

SELECT ToWKT(tgeometry) FROM hivetable t

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example

To calculate the length of a geometry

SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t

To get the distance between two geometries

SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 12: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter

create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform

create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin

create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID

create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution

bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property

bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12

Spatial

WritableGeometry

This is an implementation of Hadoops Writable interface for geometry

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example

To get an instance of WritableGeometry from WKT

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

To get an instance of WritableGeometry from WKB string

SELECT FromWKB(tgeometryepsg4267) FROM hivetable t

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example

To serialize an instance of WritableGeometry to WKT

SELECT ToWKT(tgeometry) FROM hivetable t

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example

To calculate the length of a geometry

SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t

To get the distance between two geometries

SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 13: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

WritableGeometry

This is an implementation of Hadoops Writable interface for geometry

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example

To get an instance of WritableGeometry from WKT

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

To get an instance of WritableGeometry from WKB string

SELECT FromWKB(tgeometryepsg4267) FROM hivetable t

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example

To serialize an instance of WritableGeometry to WKT

SELECT ToWKT(tgeometry) FROM hivetable t

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example

To calculate the length of a geometry

SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t

To get the distance between two geometries

SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 14: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Geometry Functions

bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 15: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Constructor Functions

The following Constructor functions are available

bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point

FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry

Function Registration

create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Parameter Type Description

jsonGeometry String The geometry in geoJSON format

Return Values

Return Type Description

WritableGeometry The geometry from geoJSON format

Examples

SELECT FromGeoJSON( type Point coordinates [1000 00] )

SELECT FromGeoJSON(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 16: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry

Function Registration

create function FromWKT as compbbigdataspatialhiveconstructFromWKT

Syntax

fromWKT(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The geometry in WKT format

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKT format

Examples

SELECT FromWKT(tgeometry) FROM hivetable t

SELECT FromWKT(tgeometryepsg4267) FROM hivetable t

SELECT FromWKT (POINT (30 20) epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 17: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry

Function Registration

create function FromWKB as compbbigdataspatialhiveconstructFromWKB

Syntax

fromWKB(String geometry [SpatialInfo CRS])

Parameters

Parameter Type Description

geometry String The WKB of the geometry in byte array format (byte[ ])

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry from WKB format

Examples

SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 18: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)

Function Registration

create function FromKML as compbbigdataspatialhiveconstructFromKML

Syntax

fromKML(String geometry)

Parameters

Parameter Type Description

geometry String A KML string where only the geometry or geometry in placemark will be parsed

Return Values

Return Type Description

WritableGeometry The geometry from KML format

Examples

SELECT FromKML(tgeometry) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 19: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

X

Spatial

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS

Function Registration

create function ST_Point as compbbigdataspatialhiveconstructST_Point

Syntax

ST_Point(String|Number X String|Number Y [SpatialInfo CRS])

Parameters

Parameter Type Description

String or Number The X ordinate

Y String or Number The Y ordinate

CRS String Optional The coordinate system for the geometry Default = EPSG4326

Return Values

Return Type Description

WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output

Examples

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103 epsg4326)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(-73750333 42736103)

SELECT ST_Point(px py pcrs) FROM points p

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 20: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Persistence Functions

The following Persistence functions are available

bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML

ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance

Function Registration

create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

GeoJSON String The GeoJSON representation of a geometry

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 21: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance

Function Registration

create function ToWKT as compbbigdataspatialhivepersistenceToWKT

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The WKT representation of a geometry

Examples

SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 22: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance

Function Registration

create function ToWKB as compbbigdataspatialhivepersistenceToWKB

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

Byte[ ] The WKB representation of a geometry expressed as a byte array

Examples

SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 23: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance

Function Registration

create function ToKML as compbbigdataspatialhivepersistenceToKML

Syntax

ToKML(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The instance of a WritableGeometry

Return Values

Return Type Description

String The KML representation of a geometry expressed as a hexadecimal encoded string

Examples

SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 24: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Predicate Functions

The following Predicate functions are available

bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry

Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common

Function Registration

create function Disjoint as compbbigdataspatialhivepredicateDisjoint

Syntax

Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the two geometry objects have no points in common otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 25: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object

Function Registration

create function Intersects as compbbigdataspatialhivepredicateIntersects

Syntax

Intersects(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if there is any direct position in common between the two geometries otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 26: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object

Function Registration

create function Overlaps as compbbigdataspatialhivepredicateOverlaps

Syntax

Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry1 overlaps the geometry2 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 27: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object

Function Registration

create function Within as compbbigdataspatialhivepredicateWithin

Syntax

Within(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

Boolean True if the geometry2 entirely contains geometry1 otherwise False

If either geometry1 or geometry2 are null Null is returned

Examples

SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2

SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore

FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 28: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry

Function Registration

create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Parameter Type Description

inputGeometry WritableGeometry The input geometry to be checked for a null or empty value

Return Values

Return Type Description

Boolean True if the geometry is null or empty otherwise False

Examples

SELECT IsNullGeometry(null)

SELECT IsNullGeometry(FromWKT(POINT(10 20)))

SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 29: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Measurement Functions

The following Measurement functions are available

bull Area bull ClosestPoints bull Distance bull Length bull Perimeter

Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area

Function Registration

create function Area as compbbigdataspatialhivemeasurementArea

Syntax

Area(WritableGeometry geometry String areaUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

areaUnits String The desired return unit type For valid values see Area Units on page 30

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 30: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Area Units

Valid values for unit are the following area units

Value Description

sq mi square miles

sq km square kilometers

sq in square inches

sq ft square foot

sq yd square yards

sq mm square millimeters

sq cm square centimeters

sq m square meters

sq survey ft square US Survey feet

sq nmi square nautical miles

acre acres

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 31: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Value Description

ha hectares

Return Values

Return Type Description

Double The area of the geometry

Examples

SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t

SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 32: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries

Function Registration

create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints

Syntax

ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned

Examples

SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 33: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Distance

Description

The Distance function calculates and returns the distance between two geometries

Function Registration

create function Distance as compbbigdataspatialhivemeasurementDistance

Syntax

Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

linearUnits String The desired return unit type For valid values see Linear Units on page 33

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

The following table lists the valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 34: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative

Examples

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t

SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 35: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type

Function Registration

create function Length as compbbigdataspatialhivemeasurementLength

Syntax

Length(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 35

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 36: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The length of the geometry

Examples

SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 37: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons

Function Registration

create function Perimeter as compbbigdataspatialhivemeasurementPerimeter

Syntax

Perimeter(WritableGeometry geometry String linearUnits [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 37

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 38: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Return Values

Return Type Description

Double The perimeter of the geometry

Examples

SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t

SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 39: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Processing Functions

The following Processing functions are available

bull Buffer bull ConvexHull bull Intersection bull Transform bull Union

Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object

Function Registration

create function Buffer as compbbigdataspatialhiveprocessingBuffer

Syntax

Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])

Parameters

Parameter Type Description

geometry WritableGeometry The geometry to buffer

offset Number The distance from the input geometry

linearUnits String The desired return unit type For valid values see Linear Units on page 40

resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 40: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Parameter Type Description

computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on

bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)

bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)

bull For engineering coordinate systems Valid type = CARTESIAN (default)

CARTESIAN The geometry coordinates are interpreted using cartesian logic

SPHERICAL The geometry coordinates are interpreted using spherical logic

Linear Units

Valid values for unit type

Value Description

mi miles

km kilometers

in inches

ft feet

yd yards

mm millimeters

cm centimeters

m meters

survey ft US Survey feet

nmi nautical miles

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 41: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Return Values

Return Type Description

WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry

Examples

SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet

SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 42: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry

Function Registration

create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

WritableGeometry The convex hull of the geometry

Examples

SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable

SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable

SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 43: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Intersection

Description

The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries

Function Registration

create function Intersection as compbbigdataspatialhiveprocessingIntersection

Syntax

Intersection(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry formed from the direct positions that are common to both input geometries

Examples

SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 44: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another

Function Registration

create function Transform as compbbigdataspatialhiveprocessingTransform

Syntax

Transform(WritableGeometry geometry String CRS)

Parameters

Parameter Type Description

geometry WritableGeometry The source input geometry

CRS String The destination coordinate system for the geometry

Return Values

Return Type Description

WritableGeometry The geometry transformed to the destination coordinate system

Examples

SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t

SELECT Transform(ST_POINT(30 20)epsg3857)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 45: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects

Function Registration

create function Union as compbbigdataspatialhiveprocessingUnion

Syntax

Union(WritableGeometry geometry1 WritableGeometry geometry2)

Parameters

Parameter Type Description

geometry1 WritableGeometry The first instance of a WritableGeometry

geometry2 WritableGeometry The second instance of a WritableGeometry

Return Values

Return Type Description

WritableGeometry The geometry that represents the union of the input geometries

Examples

SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable

SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 46: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another

Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry

The following Observer index functions are available

bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 47: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_X as compbbigdataspatialhiveobserverST_X

Syntax

ST_X(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_X(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 48: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMax as compbbigdataspatialhiveobserverST_XMax

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 49: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_XMin as compbbigdataspatialhiveobserverST_XMin

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The X minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 50: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Function Registration

create function ST_Y as compbbigdataspatialhiveobserverST_Y

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null

Examples

SELECT ST_Y(ST_Point(x y epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 51: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMax as compbbigdataspatialhiveobserverST_YMax

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y maxima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 52: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry

Function Registration

create function ST_YMin as compbbigdataspatialhiveobserverST_YMin

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Parameter Type Description

geometry WritableGeometry The input geometry

Return Values

Return Type Description

Double The Y minima of the input geometry or Null if the specified value is not a geometry

Examples

SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 53: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating

Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier

We provide three types of UDFs for processing three grid cell shapes

bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)

Hashes are useful for analysis and interoperability with other systems

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps

The following Grid index functions are available

bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 54: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular

Function Registration

create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary

Syntax

GeoHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable

SELECT GeoHashBoundary(ebvnk)

Syntax

GeoHashBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

54

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 55: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT GeoHashBoundary(x y precision) FROM hivetable

SELECT GeoHashBoundary(-73750333 42736103 3)

SELECT GeoHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 56: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

X

Spatial

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function GeoHashID as compbbigdataspatialhivegridGeoHashID

Syntax

GeoHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The geohash ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 57: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Examples

SELECT GeoHashID(x y precision) FROM hivetable

SELECT GeoHashID(-73750333 42736103 3)

SELECT GeoHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 58: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon

Function Registration

create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable

SELECT HexagonBoundary(PF625028642)

Syntax

HexagonBoundary(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide

X

58

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 59: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Parameter Type Description

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT HexagonBoundary(x y precision) FROM hivetable

SELECT HexagonBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 60: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

X

Spatial

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision

Function Registration

create function HexagonID as compbbigdataspatialhivegridHexagonID

Syntax

HexagonID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 61: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Examples

SELECT HexagonID(x y precision) FROM hivetable

SELECT HexagonID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 62: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

SquareHashBoundary

Description

The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary

Syntax

SquareHashBoundary(String UNIQUE_ID)

Parameters

Parameter Type Description

UNIQUE_ID String The unique geohash identifier of a cell in a grid

Return Values

Return Type Description

WritableGeometry A representation of the boundary of a cell in a grid

Examples

SELECT SquareHashBoundary(hashStringId) FROM hivetable

SELECT SquareHashBoundary(03332)

Syntax

SquareHashBoundary(Number|String X Number|String Y Number precision)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 63: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

X

Spatial

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

WritableGeometry The boundary of the grid cell at the given precision that the point falls into

Examples

SELECT SquareHashBoundary(x y precision) FROM hivetable

SELECT SquareHashBoundary(-73750333 42736103 3)

SELECT SquareHashBoundary(-73750333 42736103 3)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 64: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

X

Spatial

SquareHashID

Description

The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map

Function Registration

create function SquareHashID as compbbigdataspatialhivegridSquareHashID

Syntax

SquareHashID(Number|String X Number|String Y Number precision)

Parameters

Parameter Type Description

Number or String The longitude value of the point

Y Number or String The latitude value of the point

precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)

Return Values

Return Type Description

String The ID of the grid cell at the specified precision that contains the point

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 65: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Examples

SELECT SquareHashID(x y precision) FROM hivetable

SELECT SquareHashID(-73750333 42736103 3)

SELECT SquareHashID(-73750333 42736103 3)

CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID

SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 66: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Search Functions

The following Search functions are available

bull LocalPointInPolygon bull LocalSearchNearest

LocalPointInPolygon

Description

The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Function Registration

create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon

Syntax

LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing a point

dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 67: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Parameter Type Description

options Map Optional Options that allow you to set return criteria in ltString Stringgt format

Options

Option Description

shpCharset the charset to use when reading a shapefile

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

Example

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions on page 83

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 68: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Return Values

Return Type Description

geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output

Examples

Using HDFS

SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)

STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))

pipresult

In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 69: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

LocalSearchNearest

Description

The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point

Function Registration

create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest

Syntax

LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])

Parameters

Parameter Type Description

inputPoint WritableGeometry A WritableGeometry representing the point to search near

dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node

Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below

options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format

Options

Option Description Example

maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 70: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Description Example

the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)

the distance unit (if not set distanceUnit mi

Option

maxDistance

distanceUnit

returnDistanceColumnName

shpCharset

shpCrs the coordinate reference system to use when reading a shapefile

remoteDataSourceLocation

downloadLocation

downloadGroup

the default value is m for meters)

See the Distance on page 33 function for examples of supported distance units

the name of the column to use for returning the distance

the charset to use when reading a shapefile

the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)

the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)

returnDistanceColumnNameMiles

shpCharset utf-8

shpCrs epsg4326

remoteDataSourceLocationhdfsdatamydatazip

downloadLocationpbdownloads

Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well

downloadGrouppbdownloads

the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 71: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Example Description Option

storing and distributing data remotely on HDFS or S3)

For more information see Download Permissions onpage 83

queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park

Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84

Return Values

Return Type Description

geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point

Examples

Using HDFS

SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB

map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult

In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point

Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 72: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Spark

Spark Jobs

To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data

Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display

To create hexagons for a given bounding box

1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels

2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command

spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output

Sample Output

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 73: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Hexagons

A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space

Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API

The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon

One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location

The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 74: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude

Spark API

The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

For installation instructions see Location Intelligence Jar for Spatial Operations on page 77

Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 75: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

JoinByDistance

Description

joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers

Syntax

import compbbigdatalisparkapiSpatialImplicits_

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame

joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame

Parameters

Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system

Parameter Type Description

df2 DataFrame The dataframe to join to

df1Longitude Column The longitude value from the first dataframe

df1Latitude Column The latitude value from the first dataframe

df2Longitude Column The longitude value from the second dataframe

df2Latitude Column The latitude value from the second dataframe

maxDistance Length The buffer length around point 1 to search for point 2

geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required

options Map Optional Options that add extra attributes to the result of the join

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 76: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Options

Key Type Description

DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated

Return Values

Return Type Description

DataFrame The dataframe that is the result of the join

Examples

This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe

val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))

val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)

Example showing options set

val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))

val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 77: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Location Intelligence Jar for Spatial Operations

If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs

The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data

bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager

There are several options for deploying the LI jar refer to the following table to proceed according to your use case

Application Do this

Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77

Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78

Hue notebook Integrating the Location Intelligence Jar with Hue on page 79

Installing and setting up the Location Intelligence Jar for a Spark Job

Perform the following steps

1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2

where

bull --jars The path to the jar file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 78: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Integrating the Location Intelligence Jar with Zeppelin

The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library

Perform the following steps to use the LI jar in Zeppelin

1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays

2 Go to the Spark2 section click Edit The editing view displays

3 Go to the Dependencies section Enter the full local path to the jar in the artifact field

Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8

pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

4 Click Save This will restart the interpreter with the loaded library

5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 79: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Spatial

Integrating the Location Intelligence Jar with Hue

The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library

Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy

Proceed according to your platform

1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property

Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini

c) Use the following 3 lines to set the propertys value

[desktop]app_blacklist=

use_default_configuration=true

d) Click Save Continue to step 3

2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from

use_default_configuration=false

to

use_default_configuration=true

c) Save the file Continue to step 3

3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing

1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button

becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example

hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar

10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close

any open dialogs and add the LI jar to the session that was just created

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 80: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

3 - Appendix

In this section

PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 81: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Appendix

PGD Builder

The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance

Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB

Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file

Building an Index with the PGD Builder

Description

This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB

Usage

PGDBuilder -f ltfilegt [-p ltparallelgt]

Parameters

Parameter Required Description

-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 82: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Appendix

Parameter Required Description

-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine

When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine

Examples

This request will generate a PGD file for the uktab file

PGDBuilder -f Cdatauktab

This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads

PGDBuilder -f Cseamlesstab -p 4

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 83: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Appendix

Download Permissions

Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group

Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue

You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt

1 Add the group

sudo groupadd pbdownloads

2 Add users to the group

sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt

3 Using a window where no job is running restart all the services whose operating system users were added to the new group

4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)

5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property

sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 84: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Appendix

Operators and Syntax Delimiters

The supported operators and syntax delimiters in the MI SQL language are outlined below

Boolean Operators

Operator Definition

Attribute operators = lt gt = lt lt= gt gt=

Between Returns true if numeric or date values fall within a range Between is an inclusive operator

EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect

Contains Returns true if the first object contains all of the second object

Within Returns true if the first object is entirely inside the second object

ContainsCentroid Returns true if the first object contains the centroid of the second object

CentroidWithin Returns true if the first objects centroid is within the second object

Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object

In (List) Returns true if equals at least one of the values in the literal list or sub query

Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination

AND Returns true if both conditions in the WHERE clause are true

OR Returns true if either the first or second condition is true

NOT Reverses the meaning of the logical operator with which it is used

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 85: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Appendix

Arithmetic Operators

Operator Definition

+ Addition also concatenation operator NOTE String concatenation also uses amp

- Subtraction

Multiplication

Division

^ Exponentiation

Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime

Syntax Delimiters

Delimiter Definition

( ) Expression delimiters

String constant delimiters See Quote Rules on page 85

Quoted identifier delimiters

_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character

List items and function argument separators

Parameter names

Quote Rules

The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters

Examples

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 86: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Appendix

Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes

SELECT FROM SamplesNamedTablesUSA

String literals or values are surrounded by single-quotes

SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada

In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined

SELECT FROM Streets WHERE Business = Oharas

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 87: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

Appendix

Copyright

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders

Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules
Page 88: Spectrum Location Intelligence - Pitney Bowes · 2020-01-28 · Location Intelligence for Big Data 4.0 Spectrum Location Intelligence for Big Data User Guide 6. 2 -Spatial. This section

pitney bowes 0 3001 Summer Street

Stamford CT 06926-0700

USA

wwwpitneybowescom

copy 2020 Pitney Bowes Software Inc

All rights reserved

  • Table of Contents
  • Welcome
    • What is Spectrumtrade Location Intelligence for Big Data
    • Spectrumtrade Location Intelligence for Big Data Architecture
    • System Requirements and Dependencies
      • Spatial
        • Installing the SDK
        • Hive User-Defined Spatial Functions
          • Setup
          • WritableGeometry
          • Geometry Functions
            • Constructor Functions
              • FromGeoJSON
              • FromWKT
              • FromWKB
              • FromKML
              • ST_Point
                • Persistence Functions
                  • ToGeoJSON
                  • ToWKT
                  • ToWKB
                  • ToKML
                    • Predicate Functions
                      • Disjoint
                      • Intersects
                      • Overlaps
                      • Within
                      • IsNullGeometry
                        • Measurement Functions
                          • Area
                          • ClosestPoints
                          • Distance
                          • Length
                          • Perimeter
                            • Processing Functions
                              • Buffer
                              • ConvexHull
                              • Intersection
                              • Transform
                              • Union
                                • Observer Functions
                                  • ST_X
                                  • ST_XMax
                                  • ST_XMin
                                  • ST_Y
                                  • ST_YMax
                                  • ST_YMin
                                    • Grid Functions
                                      • GeoHashBoundary
                                      • GeoHashID
                                      • HexagonBoundary
                                      • HexagonID
                                      • SquareHashBoundary
                                      • SquareHashID
                                          • Search Functions
                                            • LocalPointInPolygon
                                            • LocalSearchNearest
                                                • Spark
                                                  • Spark Jobs
                                                    • Hexagon Generator
                                                      • Hexagons
                                                          • Spark API
                                                            • JoinByDistance
                                                              • Location Intelligence Jar for Spatial Operations
                                                                • Installing and setting up the Location Intelligence Jar for a Spark Job
                                                                • Integrating the Location Intelligence Jar with Zeppelin
                                                                • Integrating the Location Intelligence Jar with Hue
                                                                  • Appendix
                                                                    • PGD Builder
                                                                      • Building an Index with the PGD Builder
                                                                        • Download Permissions
                                                                        • Operators and Syntax Delimiters
                                                                          • Quote Rules