table of contentsgptext.docs.pivotal.io › archives › gptext-docs-212.pdf · installing gptext...

105
1 2 3 4 7 10 13 17 19 20 24 31 37 45 50 53 55 59 60 87 104 105 Table of Contents Table of Contents Pivotal GPText 2.1.2 Documentation Release Notes Pivotal® GPText 2.1.2 Release Notes Pivotal® GPText 2.1.1 Release Notes Pivotal® GPText 2.1 Release Notes Installing GPText Upgrading GPText Using Pivotal GPText Introduction to Pivotal GPText Working With GPText Indexes Querying GPText Indexes Customizing GPText Indexes Administering GPText GPText High Availability GPText Best Practices Glossary GPText References GPText Function Reference GPText Management Utilities GPText and Solr Data Type Mappings GPText Configuration Parameters © Copyright Pivotal Software, Inc, 2013-2017 1 2.1.2

Upload: others

Post on 25-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

12347101317192024313745505355596087104105

TableofContents

TableofContentsPivotalGPText2.1.2DocumentationReleaseNotesPivotal®GPText2.1.2ReleaseNotesPivotal®GPText2.1.1ReleaseNotesPivotal®GPText2.1ReleaseNotesInstallingGPTextUpgradingGPTextUsingPivotalGPTextIntroductiontoPivotalGPTextWorkingWithGPTextIndexesQueryingGPTextIndexesCustomizingGPTextIndexesAdministeringGPTextGPTextHighAvailabilityGPTextBestPracticesGlossaryGPTextReferencesGPTextFunctionReferenceGPTextManagementUtilitiesGPTextandSolrDataTypeMappingsGPTextConfigurationParameters

©CopyrightPivotalSoftware,Inc,2013-2017 1 2.1.2

Page 2: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

PivotalGPText2.1.2Documentation

PivotalGPTextDocumentationPDF

PivotalGPText2.1.2ReleaseNotes

InstallingPivotalGPText

UpgradingPivotalGPText

UsingPivotalGPText

GPTextReferences

AdditionalResourcesPivotalGreenplumDatabase

ApacheSolrWebSite

ApacheMADlib

©CopyrightPivotalSoftware,Inc,2013-2017 2 2.1.2

Page 3: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

ReleaseNotesRelease-specificinformationincludingnew,changed,anddeprecatedfeatures,resolvedandknownissues,migrationinformation,andsupportedhardwareandsoftware.

PivotalGPText2.1.1ReleaseNotes

PivotalGPText2.1.0ReleaseNotes

PivotalGPText2.0.0ReleaseNotes

©CopyrightPivotalSoftware,Inc,2013-2017 3 2.1.2

Page 4: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Pivotal®GPText2.1.2ReleaseNotesThisdocumentcontainsreleaseinformationforPivotalGPText2.1.2.

Published:August,2017

AboutPivotalGPTextPivotalGPTextjoinstheGreenplumDatabasemassivelyparallel-processingdatabaseserverwithApacheSolrCloudenterprisesearchandtheApacheMADlibAnalyticsLibrarytoprovidelarge-scaleanalyticsprocessingandbusinessdecisionsupport.GPTextincludesfreetextsearchaswellassupportfortextanalysis.

GPTextincludesthefollowingfeatures:

TheGPTextdatabaseschemaprovidesin-databaseaccesstoApacheSolrindexingandsearching

Customtokenizersforinternationaltextandsocialmediatext

AUniversalQueryProcessorthatacceptsquerieswithmixedsyntaxfromsupportedSolrqueryprocessors

Facetedsearchresults

Termhighlightinginresults

Greateremphasisonhighavailability

TheGPTextmanagementutilitysuiteincludescommand-lineutilitiestoperformthefollowingtasks:

Start,stop,andmonitorZooKeeperandGPTextnodes

ConfigureGPTextnodesandindexes

Addanddeletereplicasforindexshards

BackupandrestoreGPTextindexes

RecoveraGPTextnode

ExpandtheGPTextclusterbyaddingGPTextnodes

PrerequisitesInstallingGPTextalsoinstallsApacheSolrCloud6.1and,optionally,ApacheZooKeeper.

FollowingareGPTextinstallationprerequisites.

InstallandconfigureyourGreenplumDatabasesystem,version4.3.6orhigher.SeethePivotalGreenplumDatabaseInstallationGuideathttps://gpdb.docs.pivotal.io .

GPTextrunsonRedHatEnterpriseLinux5.xor6.x.

GPTextcannotbeinstalledontoasharedNFSmount.

InstallOracleJRE1.8.xandaddits bin directorytothe PATH onallhostsinthecluster.

Ensurethat nc (netcat)isinstalledonallGreenplumclusterhosts( sudo yum install nc ).

Installing lsof onallclusterhostsisrecommended( sudo yum install lsof ).

GPTextnodescanbeinstalledontheGreenplumDatabaseclusterhostsalongsidetheGreenplumsegmentsoronadditional,non-databasehostsaccessibleontheGreenplumclusternetwork.AllhostsparticipatingintheGPTextsystemmusthavethesameoperatingsystemandconfigurationandhavepasswordless-sshaccessforthegpadminuser.SeethePivotalGreenplumDatabaseInstallationGuideforinstructionstoconfigurehosts.

IfyouplantoplaceGPTextnodesontheGreenplumDatabasesegmenthosts,ensurethatyoureservememoryforGPTextusewhenyouconfigureGreenplumDatabase.TodeterminethememorytosetasideforGPText,multiplythenumberofGPTextnodestocreateoneachGreenplumsegmenthostbytheJVMmaximumsize.SubtractthismemoryfromthephysicalRAMwhencalculatingthevaluefortheGreenplumDatabasegp_vmem_protect_limit serverconfigurationparameter.SeetheGreenplumDatabaseserverconfigurationparameter gp_vmem_protect_limit intheGreenplumDatabaseReferenceGuideforrecommendedmemorycalculationformulasorvisittheGPDBVirtualMemoryCalculator website.

ApacheSolrrequiresaZooKeeperclusterwithatminimumthreenodes(fivenodesrecommended).Youcaninstalla“binding”ZooKeeperclusterwith

GPTextrequiresOracleJDK1.8.x.YoucannotuseanOpenJDKJREwithGPText.

©CopyrightPivotalSoftware,Inc,2013-2017 4 2.1.2

Page 5: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

GPTextontheGreenplumclusterhosts,oryoucanuseanexistingZooKeepercluster.WhendeployedalongsideGreenplumDatabasesegments,ZooKeeperperformancecanbeaffectedunderheavydatabaseload.Forbestperformance,installaZooKeeperclusteronseparatehostswithnetworkconnectivitytotheGreenplumnetwork.

NewFeatures,Enhancements,andFixesinGPText2.1.2

FeaturesandEnhancementsAnoptionalSolroptionsparameterisaddedtothefollowingGPTextfunctions:

gptext.search_count()

gptext.faceted_range_search()

gptext.faceted_query_search()

gptext.faceted_field_search()

The options parameterisanampersand-delimitedlistofSolroptionstoincludeinthesearch.SeeSolroptionsformoreaboutusingSolroptions.

AddedsupportfortheGreenplumDatabase5.0 uuid datatype.GreenplumDatabase uuid columnsaremappedtothe solr.UUIDField typeinGPTextindexes.Columnsof uuid typemayalsobespecifiedfortheuniqueidcolumn( id_col )inthe gptext.create_index() function.

Changedthedefaultvalueofthe term_batch_size GPTextconfigurationparameterfrom50000to1000.Thisreducesthepossibilityofanoutofmemory(OOM)errorwhenexecutingthe gptext.terms() function.IfyouexperienceOOMerrorswiththisnewdefaultvalue,youmayneedtofurtherreducethevalueof term_batch_size .FormoreinformationseeTermsQueriesandOutofMemoryErrors.

Inpreviousreleases,auserwaspermittedaccesstoaGPTextindexiftheirrolehadpermissiontoaccesstheGreenplumDatabasetablefromwhichtheindexwascreated.Permissionsarenolongercheckedonthebasetable.ThismakesitpossibletodropthebasetableandcontinuetosearchtheGPTextindex.GPTextfunctionsthatdependupontheexistenceofthebasetable,suchas gptext.add_field() ,arenotallowedafterthedatabasetablehasbeendropped.Ifthetablewaspartitioned,GPTextqueriesmustspecifytheroottablename.

FixesWhenthe defType=dismax SolroptionwasaddedtotheSolroptionsparameterofthe gptext.search() function,GPTextlowercasedtheoptionnameto deftype beforesubmittingthequerytoSolr,causingthequerytoreturnunexpectedresults.Thishasbeenfixed.

KnownIssuesFollowingareknownissuesinGPText.Workaroundsareprovidedwhenavailable.

WildcardsinGPTextSearchOptionsSolrdoesnotreturnallfieldswhenthe fl Solrsearchoptioncontainsawildcardthatmatchesfieldnames.Forexample,givenatablewithcolumnscontenta and contentb ,specifying fl=contenta,contentb,(sum,1,1) correctlyreturnsthreefields.Specifying fl=cont*,sum(1,1) correctlyreturns contenta andcontentb ,butomitsthepseudo-field sum(1,1) .

Specifyingawildcardtomatchallfields( fl=*,sum(1,1) )alsoomitsthepseudo-field.

IndexLoadFailureAfterConfigurationFileErrorIfSolrfailstoloadanindexbecauseofaconfigurationfileerror,andthentheindexisdroppedwithoutfirstcorrectingtheconfigurationfileerror,theindexcannotberecreateduntilGPTextisrestarted.Thiscanhappenifyouedit managed-schema or solrconfig.xml andintroduceanXMLsyntaxerrororatypoinconfigurationvalues.

Workaround:

1. Whenanindexfailstoload,checktheSolrlogtofindthecause.

2. Ifthecauseisaconfigurationfileerror,suchasinvalidXML,usethe gptext-config utilitytoeditthefileandfixtheerror.Droppingtheindexwithoutfirstcorrectingtheerrorisnotrecommended.

©CopyrightPivotalSoftware,Inc,2013-2017 5 2.1.2

Page 6: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

3. Ifyouhavedroppedanindexthatfailedtoloadwithoutfirstcorrectingthecauseofthefailure,youmustrestartGPTextbeforeyoucanrecreatetheindex.Run gptext-start -r torestartGPText.

StartupFailurewithLargeNumbersofIndexesWhenthereisalargenumberofSolrcores,SolrCloudcanfailtorestartsuccessfully,witherrormessagesindicatingfailuretoelectleadersforshards.ThisisaknownSolrissue;seehttps://issues.apache.org/jira/browse/SOLR-5990 intheApacheSolrJiraforanexample.Becauseofthisissue,itisrecommendedtoavoiddesigningGPTextapplicationsthatcreatelargenumbersofindexes,shards,andreplicas.Thenumberofcoresyoucancreatebeforeyouobservethisbehaviorishardwaredependent,soyoushouldtesttodetermineyoursystem’slimits.Youcancreateandsuccessfullyoperatealargernumbersofindexesthancanberestartedsuccessfullylater,sobesuretotestrestartingGPTexttodetermineapracticallimit.

SettingGPTextConfigurationParametersWithoutFirstSettingcustom_variable_classesIfthe custom_variable_classes GreenplumDatabaseserverconfigurationparameterdoesnotincludethevalue“gptext”,attemptingtosetaGPTextconfigurationparameterreturnsanerrormessage,forexample:

mydb-#setgptext.replication_factor=4;WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)ERROR:unrecognizedconfigurationparameter"gptext.replication_factor"

InGPText2.0,inadditiontotheerrormessage,thevalueoftheconfigurationparameterpersistedinZooKeeperiszero,replacingthepreviousvalueoftheparameter.

mydb-#showgptext.replication_factor;gptext.replication_factor----------------------------0

BeginningwithGPText2.1,theerrormessageisstillgenerated,howeverthevaluesavedinZooKeeperisthevaluespecifiedinthe set command,4intheprecedingexample.

Topreventtheerrormessage,beforesettinganyGPTextconfigurationparameters,usethe gpconfig command-lineutilitytosetthe custom_variable_classes

configurationparameter:

$gpconfig-ccustom_variable_classes-v'gptext'

CannotSourcegreenplum_path.shaftergreenplum-text_path.shTheGPText greenplum-text_path.sh scriptrequiresthattheGreenplumDatabase greenplum_path.sh scriptberunfirst.Forexample:

$source/usr/local/greenplum-db/greenplum_path.sh$source/usr/local/greenplum-text/greenplum-text_path.sh

TheGPTextscriptmodifiesthe PATH and PYTHONPATH environmentvariablessetpreviouslybytheGreenplumDatabasescript.

Ifyousource greenplum_path.sh againafteryouhaverun greenplum-text_path.sh ,GPText’s PYTHONPATH isoverwrittenandGPTextfails.

Aworkaroundistosourcethe greenplum-text_path.sh scriptfromthe greenplum_path.sh script.Editthefile $GPHOME/greenplum_path.sh andaddthefollowinglinetotheendofthefile:

source/usr/local/greenplum-text/greenplum-text_path.sh``

©CopyrightPivotalSoftware,Inc,2013-2017 6 2.1.2

Page 7: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Pivotal®GPText2.1.1ReleaseNotesThisdocumentcontainsreleaseinformationforPivotalGPText2.1.1.

Published:July2017

AboutPivotalGPTextPivotalGPTextjoinstheGreenplumDatabasemassivelyparallel-processingdatabaseserverwithApacheSolrCloudenterprisesearchandtheApacheMADlibAnalyticsLibrarytoprovidelarge-scaleanalyticsprocessingandbusinessdecisionsupport.GPTextincludesfreetextsearchaswellassupportfortextanalysis.

GPTextincludesthefollowingfeatures:

TheGPTextdatabaseschemaprovidesin-databaseaccesstoApacheSolrindexingandsearching

Customtokenizersforinternationaltextandsocialmediatext

AUniversalQueryProcessorthatacceptsquerieswithmixedsyntaxfromsupportedSolrqueryprocessors

Facetedsearchresults

Termhighlightinginresults

Greateremphasisonhighavailability

TheGPTextmanagementutilitysuiteincludescommand-lineutilitiestoperformthefollowingtasks:

Start,stop,andmonitorZooKeeperandGPTextnodes

ConfigureGPTextnodesandindexes

Addanddeletereplicasforindexshards

BackupandrestoreGPTextindexes

RecoveraGPTextnode

ExpandtheGPTextclusterbyaddingGPTextnodes

PrerequisitesInstallingGPTextalsoinstallsApacheSolrCloudand,optionally,ApacheZooKeeper.

FollowingareGPTextinstallationprerequisites.

InstallandconfigureyourGreenplumDatabasesystem,version4.3.6orhigher.SeethePivotalGreenplumDatabaseInstallationGuideathttps://gpdb.docs.pivotal.io .

GPTextrunsonRedHatEnterpriseLinux5.xor6.x.

GPTextcannotbeinstalledontoasharedNFSmount.

InstallOracleJRE1.8.xandaddits bin directorytothe PATH onallhostsinthecluster.

Ensurethat nc (netcat)isinstalledonallGreenplumclusterhosts( sudo yum install nc ).

Installing lsof onallclusterhostsisrecommended( sudo yum install lsof ).

GPTextnodescanbeinstalledontheGreenplumDatabaseclusterhostsalongsidetheGreenplumsegmentsoronadditional,non-databasehostsaccessibleontheGreenplumclusternetwork.AllhostsparticipatingintheGPTextsystemmusthavethesameoperatingsystemandconfigurationandhavepasswordless-sshaccessforthegpadminuser.SeethePivotalGreenplumDatabaseInstallationGuideforinstructionstoconfigurehosts.

IfyouplantoplaceGPTextnodesontheGreenplumDatabasesegmenthosts,ensurethatyoureservememoryforGPTextusewhenyouconfigureGreenplumDatabase.TodeterminethememorytosetasideforGPText,multiplythenumberofGPTextnodestocreateoneachGreenplumsegmenthostbytheJVMmaximumsize.SubtractthismemoryfromthephysicalRAMwhencalculatingthevaluefortheGreenplumDatabasegp_vmem_protect_limit serverconfigurationparameter.SeetheGreenplumDatabaseserverconfigurationparameter gp_vmem_protect_limit intheGreenplumDatabaseReferenceGuideforrecommendedmemorycalculationformulasorvisittheGPDBVirtualMemoryCalculator website.

ApacheSolrrequiresaZooKeeperclusterwithatminimumthreenodes(fivenodesrecommended).Youcaninstalla“binding”ZooKeeperclusterwith

GPTextrequiresOracleJDK1.8.x.YoucannotuseanOpenJDKJREwithGPText.

©CopyrightPivotalSoftware,Inc,2013-2017 7 2.1.2

Page 8: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

GPTextontheGreenplumclusterhosts,oryoucanuseanexistingZooKeepercluster.WhendeployedalongsideGreenplumDatabasesegments,ZooKeeperperformancecanbeaffectedunderheavydatabaseload.Forbestperformance,installaZooKeeperclusteronseparatehostswithnetworkconnectivitytotheGreenplumnetwork.

NewFeatures,Enhancements,andFixesinGPText2.1.1

FeaturesandEnhancementsAnewGPTextinstallationparameter, GPTEXT_JAVA_HOME ,hasbeenaddedtothe gptext_install_config installationconfigurationfiletospecifythehomedirectoryoftheJavainstallationtouseforrunningZooKeeperandSolrprocesses.Ifthisparameterisnotset,theJREisdeterminedfromthePATH and JAVA_HOME environmentvariables.IfyouhaveJavaapplicationsthatrequireaversionofJavanotsupportedwithGPText,youcansetthatversionintheenvironmentvariablesandsetthe GPTEXT_JAVA_HOME parametertooverrideit.

GPText2.1.1canbeusedwithGreenplumDatabase5.0Beta.AGPTextsystemdeployedonGreenplumDatabase4.3.xcanbemigratedtoGreenplumDatabase5.0.

GPTextnowcansupportuniqueidcolumnswithdatatypes int4 , int8 , varchar ,and text . gptext.search() resultsreturntheidcolumnastext values.Ifthe id_col isan int4 or int8 type,itmaybenecessarytocasttheidcolumninreturnedresultstothecorrectnumerictype.Forexample,ifyoujoinsearchresultswiththeoriginaltableororderresultsbynumericid,youwillneedtocastthereturnedidcolumntothecorrecttype.

GPTexthasbeenoptimizedtolookupconfigurationparameters(GUCs)inZooKeeperontheGreenplummasterandthendistributetheparameterstothesegments,eliminatingZooKeeperlookupsfromeveryGPTextinstance.

New timestamp and ttimestamp fieldtypesareaddedtotheSolr managed-schema toallowGPTexttostoretimestampswiththesameprecisionasGreenplumDatabase.Ifyourequiremicro-secondprecisionforfieldsstoredinaGPTextindexyoucaneditthe managed-schema andchangethetype attributeofthe <field> elementfrom date or tdate to timestamp or ttimestamp .

Fixes[MPP-28960]GPTextinstallationfailedwhensshwasconfiguredtorequirefully-qualifieddomainname(FQDN).Thishasbeenfixed.

KnownIssuesFollowingareknownissuesinGPText.Workaroundsareprovidedwhenavailable.

WildcardsinGPTextSearchOptionsSolrdoesnotreturnallfieldswhenthe fl Solrsearchoptioncontainsawildcardthatmatchesfieldnames.Forexample,givenatablewithcolumnscontenta and contentb ,specifying fl=contenta,contentb,(sum,1,1) correctlyreturnsthreefields.Specifying fl=cont*,sum(1,1) correctlyreturns contenta andcontentb ,butomitsthepseudo-field sum(1,1) .

Specifyingawildcardtomatchallfields( fl=*,sum(1,1) )alsoomitsthepseudo-field.

IndexLoadFailureAfterConfigurationFileErrorIfSolrfailstoloadanindexbecauseofaconfigurationfileerror,andthentheindexisdroppedwithoutfirstcorrectingtheconfigurationfileerror,theindexcannotberecreateduntilGPTextisrestarted.Thiscanhappenifyouedit managed-schema or solrconfig.xml andintroduceanXMLsyntaxerrororatypoinconfigurationvalues.

Workaround:

1. Whenanindexfailstoload,checktheSolrlogtofindthecause.

2. Ifthecauseisaconfigurationfileerror,suchasinvalidXML,usethe gptext-config utilitytoeditthefileandfixtheerror.Droppingtheindexwithoutfirstcorrectingtheerrorisnotrecommended.

3. Ifyouhavedroppedanindexthatfailedtoloadwithoutfirstcorrectingthecauseofthefailure,youmustrestartGPTextbeforeyoucanrecreatetheindex.Run gptext-start -r torestartGPText.

©CopyrightPivotalSoftware,Inc,2013-2017 8 2.1.2

Page 9: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

StartupFailurewithLargeNumbersofIndexesWhenthereisalargenumberofSolrcores,SolrCloudcanfailtorestartsuccessfully,witherrormessagesindicatingfailuretoelectleadersforshards.ThisisaknownSolrissue;seehttps://issues.apache.org/jira/browse/SOLR-5990 intheApacheSolrJiraforanexample.Becauseofthisissue,itisrecommendedtoavoiddesigningGPTextapplicationsthatcreatelargenumbersofindexes,shards,andreplicas.Thenumberofcoresyoucancreatebeforeyouobservethisbehaviorishardwaredependent,soyoushouldtesttodetermineyoursystem’slimits.Youcancreateandsuccessfullyoperatealargernumbersofindexesthancanberestartedsuccessfullylater,sobesuretotestrestartingGPTexttodetermineapracticallimit.

SettingGPTextConfigurationParametersWithoutFirstSettingcustom_variable_classesIfthe custom_variable_classes GreenplumDatabaseserverconfigurationparameterdoesnotincludethevalue“gptext”,attemptingtosetaGPTextconfigurationparameterreturnsanerrormessage,forexample:

mydb-#setgptext.replication_factor=4;WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)ERROR:unrecognizedconfigurationparameter"gptext.replication_factor"

InGPText2.0,inadditiontotheerrormessage,thevalueoftheconfigurationparameterpersistedinZooKeeperiszero,replacingthepreviousvalueoftheparameter.

mydb-#showgptext.replication_factor;gptext.replication_factor----------------------------0

BeginningwithGPText2.1,theerrormessageisstillgenerated,howeverthevaluesavedinZooKeeperisthevaluespecifiedinthe set command,4intheprecedingexample.

Topreventtheerrormessage,beforesettinganyGPTextconfigurationparameters,usethe gpconfig command-lineutilitytosetthe custom_variable_classes

configurationparameter:

$gpconfig-ccustom_variable_classes-v'gptext'

CannotSourcegreenplum_path.shaftergreenplum-text_path.shTheGPText greenplum-text_path.sh scriptrequiresthattheGreenplumDatabase greenplum_path.sh scriptberunfirst.Forexample:

$source/usr/local/greenplum-db/greenplum_path.sh$source/usr/local/greenplum-text/greenplum-text_path.sh

TheGPTextscriptmodifiesthe PATH and PYTHONPATH environmentvariablessetpreviouslybytheGreenplumDatabasescript.

Ifyousource greenplum_path.sh againafteryouhaverun greenplum-text_path.sh ,GPText’s PYTHONPATH isoverwrittenandGPTextfails.

Aworkaroundistosourcethe greenplum-text_path.sh scriptfromthe greenplum_path.sh script.Editthefile $GPHOME/greenplum_path.sh andaddthefollowinglinetotheendofthefile:

source/usr/local/greenplum-text/greenplum-text_path.sh``

©CopyrightPivotalSoftware,Inc,2013-2017 9 2.1.2

Page 10: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Pivotal®GPText2.1ReleaseNotesThisdocumentcontainsreleaseinformationforPivotalGPText2.1.

Published:June2017

AboutPivotalGPTextPivotalGPTextjoinstheGreenplumDatabasemassivelyparallel-processingdatabaseserverwithApacheSolrCloudenterprisesearchandtheApacheMADlibAnalyticsLibrarytoprovidelarge-scaleanalyticsprocessingandbusinessdecisionsupport.GPTextincludesfreetextsearchaswellassupportfortextanalysis.

GPTextincludesthefollowingfeatures:

TheGPTextdatabaseschemaprovidesin-databaseaccesstoApacheSolrindexingandsearching

Customtokenizersforinternationaltextandsocialmediatext

AUniversalQueryProcessorthatacceptsquerieswithmixedsyntaxfromsupportedSolrqueryprocessors

Facetedsearchresults

Termhighlightinginresults

Greateremphasisonhighavailability

TheGPTextmanagementutilitysuiteincludescommand-lineutilitiestoperformthefollowingtasks:

Start,stop,andmonitorZooKeeperandGPTextnodes

ConfigureGPTextnodesandindexes

Addanddeletereplicasforindexshards

BackupandrestoreGPTextindexes

RecoveraGPTextnode

ExpandtheGPTextclusterbyaddingGPTextnodes

PrerequisitesInstallingGPTextalsoinstallsApacheSolrCloudand,optionally,ApacheZooKeeper.

FollowingareGPTextinstallationprerequisites.

InstallandconfigureyourGreenplumDatabasesystem,version4.3.6orhigher.SeethePivotalGreenplumDatabaseInstallationGuideathttps://gpdb.docs.pivotal.io .

GPTextrunsonRedHatEnterpriseLinux5.xor6.x.

GPTextcannotbeinstalledontoasharedNFSmount.

InstallOracle JRE 1.8.x andaddits bin directorytothe PATH onallhostsinthecluster.

Ensurethat nc (netcat)isinstalledonallGreenplumclusterhosts( sudo yum install nc ).

Installing lsof onallclusterhostsisrecommended( sudo yum install lsof ).

GPTextnodescanbeinstalledontheGreenplumDatabaseclusterhostsalongsidetheGreenplumsegmentsoronadditional,non-databasehostsaccessibleontheGreenplumclusternetwork.AllhostsparticipatingintheGPTextsystemmusthavethesameoperatingsystemandconfigurationandhavepasswordless-sshaccessforthegpadminuser.SeethePivotalGreenplumDatabaseInstallationGuideforinstructionstoconfigurehosts.

IfyouplantoplaceGPTextnodesontheGreenplumDatabasesegmenthosts,ensurethatyoureservememoryforGPTextusewhenyouconfigureGreenplumDatabase.TodeterminethememorytosetasideforGPText,multiplythenumberofGPTextnodestocreateoneachGreenplumsegmenthostbytheJVMmaximumsize.SubtractthismemoryfromthephysicalRAMwhencalculatingthevaluefortheGreenplumDatabasegp_vmem_protect_limit serverconfigurationparameter.SeetheGreenplumDatabaseserverconfigurationparameter gp_vmem_protect_limit intheGreenplumDatabaseReferenceGuideforrecommendedmemorycalculationformulasorvisittheGPDBVirtualMemoryCalculator website.

ApacheSolrrequiresaZooKeeperclusterwithatminimumthreenodes(fivenodesrecommended).Youcaninstalla“binding”ZooKeeperclusterwithGPTextontheGreenplumclusterhosts,oryoucanuseanexistingZooKeepercluster.WhendeployedalongsideGreenplumDatabasesegments,ZooKeeperperformancecanbeaffectedunderheavydatabaseload.Forbestperformance,installaZooKeeperclusteronseparatehostswithnetworkconnectivitytotheGreenplumnetwork.

©CopyrightPivotalSoftware,Inc,2013-2017 10 2.1.2

Page 11: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

NewFeaturesandEnhancementsinGPText2.1Previously,allZooKeepernodesrunningonthesamehostwrotemessagestothesamelogfile.Noweachnodewritestoalogfileinitsowndatadirectory.

GPTextwasunabletoparse“ - ”and“ _ ”charactersinthe ZOO_HOSTS configurationparameterinthe gptext_install_config file.Thisisfixed.

The gptext-state managementutilityhasanew -D ( --details )option.Thedefaultformatfortheutility’soutputischangedtodisplaycountsofindexesintheGreen,Yellow,andRedstates.Withthe -D option,thecommandincludestheGreen,Yellow,orRedstatusofeachGPTextindex.Theutilityreturnsanon-zerostatusifanyindexesarenotintheGreenstate.

OptimizationsandchangeshavebeenmadetomakestartingupaGPTextclusterwithalargenumberofindexesmorestable:

Thedefaultvalueofthe replication_factor configurationparameterischangedfrom3to2.The gptext-start utilityhasanew --slow-start ( -s )optionthatperformsa“rolling-start,”startinguponeindexatatime.ThisoptionshouldbeusedwhenstartingaGPTextclusterthathasalargenumberofindexes.Ifyouhavemorethan50indexesandomitthisoption,theutilitypromptsyoutoconfirmthatyouwanttoperformafaststartup.OneGPTextinstanceoneachhostintheclusterisassignedtheoverseerroletohelpreducedemandonZooKeeper.The coreLoadThreads inthe solr.xml configurationfileisincreasedto512.The solr.jetty.threads.max parameterinthe jetty.xml fileforeachSolrbinaryisincreasedto30000.

AddedsupportforindexingGreenplumDatabasepartitionedtables.

ABoolean partitioned columnisaddedtotheresultsofthe gptext.index_status() function.The gptext-state -i <index_name> commandincludesalistofchildpartitionindexesforpartitionedtablesinalldatabases.Anew gptext.partition_status() functionlistschildpartitionindexesinthecurrentdatabase.

GPTextnodesarenolongerrequiredtorunonGreenplumDatabasesegmenthosts.GPTextnodescanbedeployedonseparatehostswithconnectivitytotheGreenplumnetwork,andpasswordless-sshaccessforthegpadminuserbetweenallGreenplumDatabaseandGPTextnodes.TheGPTextinstallerandthe gptext-uninstall , gptext-expand ,and gptext-recover managementutilitiesareupdatedtoaccommodatethischange.

Addedautility, gptext_migrator ,tomigrateaGPTextsystemintoaGreenplumDatabaseclusterthathasbeenupgradedtoanewGPDBrelease.MigratingdoesnotchangetheGPTextversion;itmovesthecurrentGPTextsystemtoadifferentGreenplumDatabasecluster.MigratingaGPTextsystemfromGPDB4.3.xtoGPDB5.xisnotcurrentlysupported.

TheGPText2.1installercanupgradeaGPText2.0systemtothenewGPTextrelease.Ifrequested,theinstallerrunsanewutility, gptext-upgrade ,toupgradetheGPTextsystem.The gptext-upgrade utilitycanalsoberunseparatelyfromtheinstaller.

KnownIssuesFollowingareknownissuesinGPText.Workaroundsareprovidedwhenavailable.

WildcardsinGPTextSearchOptionsSolrdoesnotreturnallfieldswhenthe fl Solrsearchoptioncontainsawildcardthatmatchesfieldnames.Forexample,givenatablewithcolumnscontenta and contentb ,specifying fl=contenta,contentb,(sum,1,1) correctlyreturnsthreefields.Specifying fl=cont*,sum(1,1) correctlyreturns contenta andcontentb ,butomitsthepseudo-field sum(1,1) .

Specifyingawildcardtomatchallfields( fl=*,sum(1,1) )alsoomitsthepseudo-field.

IndexLoadFailureAfterConfigurationFileErrorIfSolrfailstoloadanindexbecauseofaconfigurationfileerror,andthentheindexisdroppedwithoutfirstcorrectingtheconfigurationfileerror,theindexcannotberecreateduntilGPTextisrestarted.Thiscanhappenifyouedit managed-schema or solrconfig.xml andintroduceanXMLsyntaxerrororatypoinconfigurationvalues.

Workaround:

1. Whenanindexfailstoload,checktheSolrlogtofindthecause.

2. Ifthecauseisaconfigurationfileerror,suchasinvalidXML,usethe gptext-config utilitytoeditthefileandfixtheerror.Droppingtheindexwithoutfirstcorrectingtheerrorisnotrecommended.

3. Ifyouhavedroppedanindexthatfailedtoloadwithoutfirstcorrectingthecauseofthefailure,youmustrestartGPTextbeforeyoucanrecreatetheindex.Run gptext-start -r torestartGPText.

©CopyrightPivotalSoftware,Inc,2013-2017 11 2.1.2

Page 12: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

StartupFailurewithLargeNumbersofIndexesWhenthereisalargenumberofSolrcores,SolrCloudcanfailtorestartsuccessfully,witherrormessagesindicatingfailuretoelectleadersforshards.ThisisaknownSolrissue;seehttps://issues.apache.org/jira/browse/SOLR-5990 intheApacheSolrJiraforanexample.Becauseofthisissue,itisrecommendedtoavoiddesigningGPTextapplicationsthatcreatelargenumbersofindexes,shards,andreplicas.Thenumberofcoresyoucancreatebeforeyouobservethisbehaviorishardwaredependent,soyoushouldtesttodetermineyoursystem’slimits.Youcancreateandsuccessfullyoperatealargernumbersofindexesthancanberestartedsuccessfullylater,sobesuretotestrestartingGPTexttodetermineapracticallimit.

SettingGPTextConfigurationParametersWithoutFirstSettingcustom_variable_classesIfthe custom_variable_classes GreenplumDatabaseserverconfigurationparameterdoesnotincludethevalue“gptext”,attemptingtosetaGPTextconfigurationparameterreturnsanerrormessage,forexample:

mydb-#setgptext.replication_factor=4;WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)ERROR:unrecognizedconfigurationparameter"gptext.replication_factor"

InGPText2.0,inadditiontotheerrormessage,thevalueoftheconfigurationparameterpersistedinZooKeeperiszero,replacingthepreviousvalueoftheparameter.

mydb-#showgptext.replication_factor;gptext.replication_factor----------------------------0

BeginningwithGPText2.1,theerrormessageisstillgenerated,howeverthevaluesavedinZooKeeperisthevaluespecifiedinthe set command,4intheprecedingexample.

Topreventtheerrormessage,beforesettinganyGPTextconfigurationparameters,usethe gpconfig command-lineutilitytosetthe custom_variable_classes

configurationparameter:

$gpconfig-ccustom_variable_classes-v'gptext'

©CopyrightPivotalSoftware,Inc,2013-2017 12 2.1.2

Page 13: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

InstallingGPText

PrerequisitesTheGPTextinstallationincludestheinstallationofApacheSolrCloudand,optionally,ApacheZooKeeper.

IfyouareinstallinganewGPTextreleaseintoanexistingGPTextsystem,followtheinstructionsinUpgradingGPTextinstead.

FollowingareGPTextinstallationprerequisites.

InstallandconfigureyourGreenplumDatabasesystem,version4.3.6orhigher.SeethePivotalGreenplumDatabaseInstallationGuideathttps://gpdb.docs.pivotal.io .

GPTextrunsonRedHatEnterpriseLinux5.xor6.x.

GPTextcannotbeinstalledontoasharedNFSmount.

InstallOracle JRE 1.8.x onallhostsinthecluster.

Ensurethat nc (netcat)isinstalledonallGreenplumclusterhosts( yum install nc ).

Installing lsof onallclusterhostsisrecommended( sudo yum install lsof ).

GPTextnodescanbeinstalledontheGreenplumDatabaseclusterhostsalongsidetheGreenplumsegmentsoronadditional,non-databasehostsaccessibleontheGreenplumclusternetwork.AllhostsparticipatingintheGPTextsystemmusthavethesameoperatingsystemandconfigurationandhavepasswordless-sshaccessforthegpadminuser.SeethePivotalGreenplumDatabaseInstallationGuideforinstructionstoconfigurehosts.

IfyouplantoplaceGPTextnodesontheGreenplumDatabasesegmenthosts,ensurethatyoureservememoryforGPTextusewhenyouconfigureGreenplumDatabase.TodeterminethememorytosetasideforGPText,multiplythenumberofGPTextnodestocreateoneachGreenplumsegmenthostbytheJVMmaximumsize.SubtractthismemoryfromthephysicalRAMwhencalculatingthevaluefortheGreenplumDatabasegp_vmem_protect_limit serverconfigurationparameter.SeetheGreenplumDatabaseserverconfigurationparameter gp_vmem_protect_limit intheGreenplumDatabaseReferenceGuideforrecommendedmemorycalculationformulasorvisittheGPDBVirtualMemoryCalculator website.

ApacheSolrrequiresaZooKeeperclusterwithatminimumthreenodes.Youcaninstalla“binding”ZooKeeperclusterwithGPTextontheGreenplumclusterhosts,oryoucanuseanexistingZooKeepercluster.WhendeployedalongsideGreenplumDatabasesegments,ZooKeeperperformancecanbeaffectedunderheavydatabaseload.Forbestperformance,installaZooKeeperclusterwithatleastthreenodes(fivenodesrecommended)onseparatehostswithnetworkconnectivitytotheGreenplumnetwork.

InstalltheGPTextBinaries1. OntheGreenplummasterhost,extracttheGPTextdistributionfile,acompressedtararchive.Forexample:

cd/home/gpadmintarxvfzgreenplum-text-release-rhel5_x86_64.tar.gz

Thereleasedirectorycontainsaninstallationconfigurationfile, gptext_install_config ,andtheGPTextinstallationbinary,whichhasanamesimilartogreenplum-text-<version>-<platform>.bin ,forexample, greenplum-text-2.1.0-rhel5_x86_64.bin .

2. Ifnecessary,grantexecutepermissiontotheGPTextbinary.Forexample:

chmod+x/home/gpadmin/greenplum-text-2.1.0-rhel5_x86_64.bin

3. IfyouareinstallingGPTextinadirectorythatisonlyaccessibletoroot,forexample /usr/local ,performthesesteps:

a. Createtheinstallationdirectoryasrootandchangetheownershiptothegpadminuser.b. Toinstalltoadirectorywheretheusermayormaynothavewritepermissions:

Usegpsshtocreateadirectorywiththesamefilepathonallhosts( mdw , smdw ,andthesegmenthosts sdw1 , sdw2 ,andsoon).Forexample:

/usr/local/<gptext-version>

Asroot,setthefilepermissionsandowner.Forexample:

GPTextrequiresOracleJDK1.8.x.YoucannotuseanOpenJDKJREwithGPText.

©CopyrightPivotalSoftware,Inc,2013-2017 13 2.1.2

Page 14: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

#chmod775/usr/local/<gptext-version>#chowngpadmin:gpadmin/usr/local/<gptext-version>

4. Editthe gptext_install_config filetosetparametersfortheinstallation.SeeSetInstallationParametersfordetails.

5. RuntheGPTextinstallationbinaryas gpadmin onthemasterserver:

./gptext-version.bin-cgptext_install_config

6. AcceptthePivotallicenseagreement.

SetInstallationParametersAGPTextconfigurationfilenamed gptext_install_config containsparameterstoconfiguretheGPTextinstallation.Editthefileandsettheparametersasdescribedinthefollowingtable.

Table1.GPTextinstallationparameters

Parameter Description Example

GPTEXT_HOSTS

AnarrayofhostnamesonwhichtoinstallGPText,orusetheconstant"ALLSEGHOSTS" toinstallGPTextonallGreenplumDatabasesegmenthosts.GPTexthostsmustbepasswordlessssh-accessiblebythegpadminuserfromallotherhostsintheGreenplumCluster.

declare -a GPTEXT_HOSTS=(gptext_host1 gptext_host2 gptext_host3)

GPTEXT_HOSTS="ALLSEGHOSTS"

DATA_DIRECTORY

AnarrayofdirectorypathswhereGPTextdatadirectoriesaretobecreated.ThenumberofdirectoriesinthearraydeterminesthenumberofGPTextnodesthatwillbecreatedoneachphysicalhost.If GPTEXT_HOSTS listsmultipleinterfacesperhost,theGPTextnodesarespreadevenlyacrosstheinterfaceaddresses.

declare -a DATA_DIRECTORY=(/data/primary /data/primary)

JAVA_OPTSSetstheminimumandmaximummemoryeachSolrCloudJVMcanuse.

JAVA_OPTS="-Xms1024M -Xmx2048M"

GPTEXT_PORT_BASE

GP_MAX_PORT_LIMIT

SetarangeofportnumbersavailabletoGPTextnodes.GPTextfindsunusedportsinthespecifiedrange.

GPTEXT_PORT_BASE=18983GP_MAX_PORT_LIMIT=28983

ZOO_CLUSTER

WhethertodeployaGPTextbindingZooKeeperclusteroruseanexistingZooKeepercluster.Ifsetto "BINDING"theinstallationdeploysaZooKeepercluster.TouseanexistingZooKeepercluster,setthisparametertoalistofZooKeepernodesintheformat"host1:port,host2:port,host3:port “.

ZOO_CLUSTER="BINDING"

If ZOO_CLUSTER issetto "BINDING" ,thisparameterisanarrayofthehostswheretheZooKeepernodesaretobeinstalled.

The GPTEXT_HOSTS and DATA_DIRECTORY installationparametersdeterminethenumberofGPTextnodesthataredeployed.Thenumberofdirectoriesincludedinthe DATA_DIRECTORY arrayisthenumberofGPTextnodesthatarecreatedperhost.The GPTEXT_HOSTS parameterdeterminesthenumberofhosts.Ifsettotheconstant "ALLSEGHOSTS" thenumberofGPTextnodehostsisthesameasthenumberofGreenplumsegmenthosts.If GPTEXT_HOSTS issettoanarrayofhostnames,thelengthofthearrayisthenumberofGPTextnodehosts.ThetotalnumberofGPTextnodesshouldnotexceedthenumberofGreenplumDatabaseprimarysegments.

©CopyrightPivotalSoftware,Inc,2013-2017 14 2.1.2

Page 15: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

ZOO_HOSTS

Thearraymustcontain3,5,or7hostnames,forexampleZOO_HOSTS=(sdw1 sdw2 swd3 sdw4sdw5)

.IfyouareusingasinglehostforZooKeeper,specifyitmultipletimes,forexample, ZOO_HOSTS=(sdw1 sdw1 swd1) .

declare -a ZOO_HOSTS=(localhost localhost localhost localhostlocalhost)

ZOO_DATA_DIRTheZooKeeperdatadirectory,requiredwhen ZOO_CLUSTER issetto "BINDING" .

ZOO_DATA_DIR="/data/master/"

ZOO_GPTXTNODE

ThenodepathinZooKeeperforGPText.ThisparameterisrequiredwhetherZOO_CLUSTER issetto "BINDING" oralistofhosts.

ZOO_GPTXTNODE="gptext"

ZOO_PORT_BASE

ZOO_MAX_PORT_LIMIT

ArangeofportnumberstousefortheZooKeepercluster.Unusedportsareallocatedfromwithinthisrange.Therangemustcontainatleast4000portnumbers.

ZOO_PORT_BASE=2188ZOO_MAX_PORT_LIMIT=12188

GPTEXT_JAVA_HOME

ThehomedirectoryoftheJavainstallationtorunforZooKeeperandSolrprocesses.Ifnotset,theJREspecifiedinthe PATH andJAVA_HOME environmentvariableswillbeused.

GPTEXT_JAVA_HOME=/usr/java/jdk1.8.0_131

Parameter Description Example

StartingGPTextFirst,makesuretheGPTextcommand-lineutilitiesareinyourpathbysourcingtheGreenplumDatabaseandGPTextenvironmentscripts.ItisimportanttosourcetheGPTextenvironmentscripteachtimeyousourcetheGreenplumDatabasescript.Forexample:

source/usr/local/greenplum-db-<version>/greenplum_path.shsource/usr/local/greenplum-text-<version>/greenplum-text_path.sh

TouseGPTextinadatabase,youmustfirstusethe gptext-installsql managementutilitytoinstalltheGPTextuser-definedfunctionsandotherobjectsinthedatabase:

gptext-installsqldatabase[database2...]

TheGPTextobjectsarecreatedinthe gptext schema.

TheZooKeeperclustermustberunningbeforeyoustartGPText.IfyouinstalledaboundZooKeepercluster,startitwiththe zkManager command-lineutility.

$zkManagerstart

StartGPTextwiththe gptext-start utility.

$gptext-start

UninstallingGPTextTouninstallGPText,runthe gptext-uninstall utility.YoumusthavesuperuserpermissionsonalldatabaseswithGPTextschemastorun gptext-uninstall .

gptext-uninstall runsonlyifthereisatleastonedatabasewithaGPTextschema.

Execute:

gptext-uninstall

©CopyrightPivotalSoftware,Inc,2013-2017 15 2.1.2

Page 16: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

©CopyrightPivotalSoftware,Inc,2013-2017 16 2.1.2

Page 17: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

UpgradingGPTextUpgradingaGPTextsystemtoanewGPTextreleaseinstallsthenewGPTextsoftwarereleaseonallhostsintheGreenplumclusterandthenupgradestheGPTextsystem.

UpgradingGPTextandGreenplumDatabaseattheSameTimeIfyouareupgradingtonewreleasesofGreenplumDatabaseandGPTextatthesametime,followthesesteps:

1. CompletetheGreenplumDatabaseupgradefirstandensurethedatabaseisoperational.

2. RuntheGPText gptext-migrator utilitytomigrateyourcurrentGPTextsystemtothenewlyupgradedGreenplumDatabasesystem.

3. EnsurethatthecurrentversionofGPTextworkswiththenewGreenplumDatabaseversion.

4. ProceedwiththeGPTextupgrade.

UpgradingaGPTextReleaseUpgradingaGPTextreleaseisatwo-partprocess:installthenewsoftwarereleaseontheGreenplumclusterhostsandthenupgradetheexistingGPTextsystem.TheGPTextinstallerperformsthefirstpart,installingthenewsoftware.The gptext-upgrade utilityperformsthesecondpart,upgradingthecurrentGPTextsystemtothenewversion.

TheGPTextinstallerdetectsanexistingGPTextsystemand,afterinstallingthenewsoftwarerelease,offerstorunthe gptext-upgrade utilityforyou.IfyouchoosetoupgradetheGPTextsystemlater,youcanrunthe gptext-upgrade utilityyourself.

AllupgradetasksareexecutedontheGreenplummasterhostasthe gpadmin user.The gpadmin usermusthavewritepermissioninthedirectorywherethenewGPTextreleaseistobeinstalled, /usr/local/greenplum-text-<release>-<version> bydefault.

YoumuststoptheGPTextsystembeforeupgrading.TheGreenplumDatabaseandZooKeeperclustersmustberunning.

Followthesesteps:

1. DownloadthenewGPTextreleaseforyourplatformfromPivotalNetwork .

2. Extractthereleasepackage.

$tarxfzgreenplum-text-<version>-<platform>.tar.gz

3. ShutdowntheGPTextsystem.

$gptext-stop

4. RuntheGPTextinstaller.

$./greenplum-text-<version>-<platform>.bin

5. TheinstallerpromptsyoutoacceptthePivotallicenseagreementandtochooseandcreatetheinstallationdirectory.

6. Theinstallerverifiestheenvironmenttoensurethatprerequisitesarepresent,suchasPythonandJava.Ifanyproblemsarediscovered,theinstalleroutputsanerrormessageandstops.Correcttheproblemidentifiedbythemessageandruntheinstalleragain.

7. AfterthenewsoftwarehasbeeninstalledontheGreenplumcluster,theinstallerlooksforanexistingGPTextinstallation.IfanexistingGPTextsystemisfound,theinstallerasksifyouwishtoupgradeGPTextdirectly.

Ifyouansweryes,theinstallerrunsthe gptext-upgrade script.The gptext-upgrade utilityvalidatestheenvironmenttoensureitcancompletetheupgrade,thenexecutestheupgradeandrestartstheGPTextsystem.Ifanyproblemsarediscovered, gptext-upgrade outputsamessageandquits.Fixtheindicatedproblemsandrunthegptext-upgradeutility(at <NEW_GPTEXTHOME>/bin/gptext-upgrade )tocompletetheGPTextsystemupgrade.

WhenupgradingGPText,youdonotspecifyaninstallationconfigurationfileasyoudofortheinitialGPTextinstallation.

©CopyrightPivotalSoftware,Inc,2013-2017 17 2.1.2

Page 18: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Ifyouanswerno,youmustrunthe gptext-upgrade scriptaftertheinstallercompletes.Seethegptext-upgradeutilityreferenceforinstructions.

8. Aftertheupgradehascompleted,sourcethe greenplum-text_path.sh inthenewGPTextreleasedirectoryandrun gptext-statehealthcheck toverifytheGPTextsystem:

$source/usr/local/greenplum-text-<version>/greenplum-text_path.sh$gptext-statehealthcheck

©CopyrightPivotalSoftware,Inc,2013-2017 18 2.1.2

Page 19: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

UsingPivotalGPTextIntroductiontoPivotalGPText

WorkingWithGPTextIndexes

QueryingGPTextIndexes

CustomingGPTextIndexes

AdministeringGPText

GPTextHighAvailability

Glossary

©CopyrightPivotalSoftware,Inc,2013-2017 19 2.1.2

Page 20: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

IntroductiontoPivotalGPTextPivotalGPTextenablesprocessingmassquantitiesofrawtextdata(suchassocialmediafeedsore-maildatabases)intomission-criticalinformationthatguidesbusinessandprojectdecisions.GPTextjoinstheGreenplumDatabasemassivelyparallel-processingdatabaseserverwithApacheSolrCloudenterprisesearchandtheMADlibAnalyticsLibrarytoprovidelarge-scaleanalyticsprocessingandbusinessdecisionsupport.GPTextincludesfreetextsearchaswellassupportfortextanalysis.GPTextsupportsbusinessdecisionmakingbyoffering:

Multiplekindsofdata:GPTextsupportsbothsemi-structuredandunstructureddatasearches,whichexponentiallyincreasesthekindsofinformationyoucanfind.

Lessschemadependence:GPTextdoesnotrequirestaticschemastosuccessfullylocateinformation;schemascanchangeorbequitesimpleandstillreturntargetedresults.

Textanalytics:GPTextsupportsanalysisoftextdatawithmachinelearningalgorithms.TheMADlibanalyticslibraryisintegratedwithGreenplumDatabaseandisavailableforusewithGPText.

Thischaptercontainsthefollowingtopics:

GPTextSystemArchitecture

GPTextSampleUseCase

GPTextWorkflow

TextAnalysis

GPTextSystemArchitectureGPTextcombinesaGreenplumDatabaseclusterwithanApacheSolrCloudcluster.GreenplumDatabasesegmentsandGPTextnodescanbedeployedonthesamehostsorondifferenthostswithnetworkconnectivity.

ThefollowingfigureshowstheprocessarchitectureofthecombinedGreenplumDatabaseandApacheSolrclusters.ThefigureshowsfourclusternodeswithfourGreenplumsegmentsandfourSolrinstancesdeployedoneach.AnApacheZooKeeperservicemanagestheSolrCloudcluster.BecauseZooKeeperismostefficientwithanoddnumberofservers,ZooKeepernodesaredeployedonthreeofthefourhosts.GreenplumDatabaseusersaccessSolrCloudservicesviaGPTextuser-definedfunctionsinstalledinGreenplumdatabasesandcommand-lineutilities.

ThefigureomitstheGreenplummasterhost,secondarymaster,andmirrorsegmentsfortheGreenplumprimarysegments.

©CopyrightPivotalSoftware,Inc,2013-2017 20 2.1.2

Page 21: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

TheGreenplumsegments,Solrinstances,andZooKeepernodesmayallbedeployedonseparatehostsonthesamenetwork,dependingonapplicationandperformancerequirements.

ThefollowingsectionsdescribehowGPTextintegratesSolrCloudwithGreenplumDatabaseandhowthetwoclustersworktogethertoprovideparalleltextsearchcapabilitiesinGreenplumDatabaseandmaintainhighavailability.

GreenplumDatabaseClusterAGreenplumDatabaseclusteriscomprisedofthefollowingcomponents:

Amasterdatabaseinstance,executingonadedicatedhost,conventionallynamed mdw .(Notillustrated)

Asecondarymasterinstance,onahostconventionallynamed smdw ,actingasawarmstandbyforthemasterinstance.(Notillustrated)

Anarrayofdatabaseprimarysegmentinstancesandmirrorsdeployedonsegmenthosts,byconvention sdw1 through sdwn .AsegmentinstanceisanindependentPostgresdatabaseprocessmanagingaportionofthedistributeddata.Eachsegmenthasamirror(notillustrated)onanotherhostintheclustertoprovideuninterruptedserviceincaseofasegmentorsegmenthostfailure.Thenumberofprimarysegmentsperhostisdeterminedbythehardwareconfiguration—thenumberandtypeofprocessorcores,theamountofphysicalRAM,localstoragecapacity,andnetworkcapacity—aswellasavailabilityandperformancerequirements.

TheGreenplummasterinstancecoordinatestheworkofthesegmentinstances.OptimalperformanceofaGreenplumDatabaseclusterrequiresthatallsegmenthostsbeconfiguredidenticallywiththesamenumberofprimaryandmirrorsegmentsoneach,andwiththedatabasedatadistributedevenlyamongthesegmentinstances.Thefullcapacityofthedatabaseclusterisutilizedwheneverysegmenthostperformsanequalamountofwork.

ApacheSolrCloudApacheSolrisaserverprovidingaccesstoApacheLucenefull-textindexes.ApacheSolrCloudisahighlyavailable,faulttolerantclusterofApacheSolrservers.ThetermGPTextclusterisanotherwaytorefertoaSolrCloudclusterdeployedbyGPTextforusewithaGreenplumDatabasesystem.

ASolrCloudclusteriscomprisedofthefollowingcomponents:

AnApacheZooKeeperclustertomanagetheSolrCloudcluster.SolrCloudusesZooKeepertomanageserverconfigurationandtocoordinatethecluster’sactivities.AZooKeeperclustershouldhaveanoddnumberofservers.GPTextcaninstallaZooKeeperclusterthatisboundtotheGPTextcluster,oritcanshareanexistingZooKeepercluster.IfGPTextinstallstheZooKeepercluster,itcanbemanagedusingGPTextfunctionsandutilities.

MultipleSolrCloudserverinstancesdeployedontheGreenplumsegmenthostsoronotherhostsonthesamenetwork.EachinstanceisaJVMprocessrunningSolrserver.SolrCloudinstancesuselocalstorage,whichmaybethesamelocalstoragevolumesthatstoreGreenplumDatabasedata.ThenumberofSolrCloudinstancesperhostcanbethesameasthenumberofGreenplumprimarysegmentsperhost,butthisisnotarequirement.ThenumberofinstancestoexecuteperhostisspecifiedduringGPTextinstallation.

GPTextprovidesdocumentindexingandsearchcapabilitiesforGreenplumDatabasebyaddinguser-definedfunctions(UDFs)thataccessSolrAPIsfromwithindatabasequeries.

GPTextUDFsperformthefollowingtasks:

createandmanageGPTextindexes

insertdocumentsintoindexes

searchindexes

TherearealsoGPTextUDFsandcommand-lineutilitiestoconfigure,monitor,andmanagetheSolrCloudclusterandtomanagereplicas,SolrCloud’shigh-availabilitymechanism.(Moreonreplicasinthenextsection.)

ParallelisminGPTextIndexingandSearchingSolrClouddistributesdocumentindexesinslicescalledshards.WithGPText,thenumberofshardsforanindexisthesameasthenumberofGreenplumsegments,soeachGreenplumsegmentoperatesonanequalportionoftheindex.EachshardismanagedbyaSolrCloudinstanceandtheshardsaredistributedevenlyamongtheSolrCloudinstances.TheSolrCloudinstanceandGreenplumsegmentarenotrequiredtobeonthesamehost.

HighAvailabilityforGPTextIndexesSolrCloudprovideshighavailabilitybymaintainingreplicasofshardsandprovidingautomaticfailoverifashardfailsorbecomesunavailable.Onereplicaofeachshardistheleadreplicaandanychangestoitareappliedtotheotherreplicas.Thereplicationfactor,whichdeterminesthenumberofreplicasto

©CopyrightPivotalSoftware,Inc,2013-2017 21 2.1.2

Page 22: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

maintainforeachshard,issetwhentheindexiscreated.ReplicasmayalsobeaddedordroppedlaterusingGPTextUDFsorcommand-lineutilities.

ZooKeeperdeterminesthelocationsofshardreplicasamongtheSolrnodesandhosts.WhenaddingareplicausingaGPTextUDForcommand-lineutility,anewshardcanbeexplicitlyplacedonaSolrCloudinstance.

GPTextSampleUseCaseForensicfinancialanalystsneedtolocatecommunicationsamongcorporateexecutivesthatpointtofinancialmalfeasanceintheirfirm.Theanalystsusethefollowingworkflow:

1. LoadtheemailrecordsintoaGreenplumdatabase.

2. CreateaSolrindexoftheemailrecords.

3. Runqueriesthatlookfortextstringsandtheirauthors.

4. Refinethequeriesuntiltheypairadummycompanynamewithtopthreeorfourexecutivescorrespondingaboutsuspectoffshorefinancialtransactions.Withthisdata,theanalystscanfocustheinvestigationonspecificindividualsratherthanthethousandsofauthorsintheinitialdatasample.

GPTextWorkflowGPTextworkswithGreenplumDatabaseandApacheSolrCloudtostoreandindexbigdataforinformationretrieval(query)purposes.High-levelworkflowsincludedataloadingandindexing,anddataquerying.

Thistopicdescribesthefollowinginformation:

DataLoadingandIndexingWorkflow

QueryingDataWorkflow

DataLoadingandIndexingWorkflowThefollowingdiagramshowstheGPTextworkflowforloadingandindexingdata.

AllclientinteractionwiththesystemisthroughtheGreenplummasterinstance.

1. LoaddataintoyourGreenplumDatabasesystem.Createadatabasetabletoholddataandthenaddthedatatothetable.Greenplumprovidesparalleldataloadingutilitiesandprotocolsthathelptotransformandloadexternaldatainvariousformatsandfromvarioussources.Fordetails,seetheGreenplumDatabaseAdministratorGuide,athttp://gpdb.docs.pivotal.io .

2. CreateanemptyGPTextindex.

©CopyrightPivotalSoftware,Inc,2013-2017 22 2.1.2

Page 23: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Usethe gptext.create_index() user-definedfunction(UDF)tocreateanemptyGPTextindexforthetable.EachGreenplumsegmentwillmanageasliceoftheindex,calledashard.SolrCloudcreatesmultiplereplicasforeachshard,distributedamongtheSolrinstances,andchoosesaleadreplicafortheGreenplumsegmenttooperateupon.Solrmanagesreplicationbetweenthereplicas.

3. Populatetheindexwithdatafromthedatabasetable.Usethe gptext.index() UDFtoadddatatotheindex.ThisUDFworksbydispatchingaSQLquerytoexecuteoneachGreenplumsegment.ThesegmentsexecutethequeryandaddtheresultstotheirshardsusingSolrAPIs.

4. Commitchangestotheindex.CommitchangestotheGPTextindexbycallingthe gptext.commit_index() UDF.Untilthechangesarecommitted,queriesexecutedontheindexcannotaccessanydataaddedtotheindexwith gptext.index() .Ifneeded,uncommittedchangescanberolledback.SolrCloudreplicateschangescommittedtotheleadreplicatotheshards’non-leadreplicas.

QueryingDataWorkflowThefollowingdiagramshowsthehigh-levelGPTextqueryprocessworkflow:

1. AusersubmitsaSQLquerydesignedtosearchtheindexeddata.AGPTextsearchqueryisaSQL SELECT statementonaGPTextsearchUDFthatcontainsfull-textsearchexpressions.

2. TheGreenplummasterdispatchesthequerytotheGreenplumsegments.

3. Eachsegmentexecutesthequery,usingtheSolrAPItosearchitsindexshard.SolrCloudexecutesthesearchqueryontheleadreplicafortheshard.

4. TheGreenplumsegmentsreturntheresultsofthesearchquerytotheGreenplummaster.

5. TheGreenplummasteraggregatestheresultsfromallsegmentsandreturnsthemtotheclient.

TextAnalysisGPTextenablesanalysisofSolrindexeswithApacheMADlib,anopensourcelibraryforscalablein-databaseanalytics.MADlibprovidesdata-parallelimplementationsofmathematical,statistical,andmachinelearningmethodsforstructuredandunstructureddata.YoucanuseGPTexttoperformavarietyofMADlibanalyses.

LearnmoreaboutApacheMADlibathttp://madlib.apache.org .A gppkg packageforMADlibisavailableonthePivotalnetworkathttp://network.pivotal.io .

©CopyrightPivotalSoftware,Inc,2013-2017 23 2.1.2

Page 24: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

WorkingWithGPTextIndexesIndexingpreparesdocumentsfortextanalysisandfastqueryprocessing.ThistopicshowsyouhowtocreateGPTextindexesandadddocumentstothem,andtomaintainandcustomizeindexesforyourownapplications.

ToworkwithGPTextindexes,youmustfirstsetupyourenvironmentandaddtheGPTextschematothedatabasecontainingthedocuments(GreenplumDatabasedata)youwanttoindex.

SettheGPTextcommand-lineenvironmentTosettheenvironment,loginasthe gpadmin userandsourcetheGreenplumDatabaseandGPTextenvironmentscripts.TheGreenplumDatabaseenvironmentmustbesetbeforeyousourcetheGPTextenvironmentscript.Forexample,ifbothGreenplumDatabaseandGPTextareinstalledinthe/usr/local/ directory,enterthesecommands:

$source/usr/local/greenplum-db-<version>/greenplum_path.sh$source/usr/local/greenplum-text-<version>/greenplum-text_path.sh

Withtheenvironmentnowset,youcanaccesstheGPTextcommand-lineutilities.

AddtheGPTextschematoadatabaseUsethe gptext-installsql utilitytoaddtheGPTextschematodatabasescontainingdatayouwanttoindexwithGPText.Youperformthistaskonetimeforeachdatabase.Inthisexample,the gptext schemaisinstalledintothe wikipedia database.Enterthefollowingcommandstocreatethe wikipediadatabaseandinstalltheGPTextschema.

$createdbwikipedia$gptext-installsqlwikipedia

The gptext schemaprovidesuser-definedtypes,tables,views,andfunctionsforGPText.ThisschemaisreservedforGPText.Ifyoucreateanynewobjectsinthe gptext schema,theywillbelostwhenyoureinstalltheschemaorupgradeGPText.

CreatingGPTextIndexesandIndexingDataThegeneralstepsforcreatingaGPTextindexandindexingdocumentsare:

1. CreateanemptySolrindex

2. Populatetheindex

3. Committheindex

Afteryoucompletethesesteps,youcancreateandexecuteasearchqueryorimplementmachinelearningalgorithms.SearchingGPTextindexesisdescribedintheQueryingGPTextIndexestopic.

ThefollowingstepsarecompletedbyexecutingSQLcommandsandGPTextfunctionsinthedatabase.RefertotheGPTextFunctionReferencefordetailsabouttheGPTextfunctionsdescribedinthefollowingexamples.

Toopenaninteractiveshellforexecutingqueriesinthe wikipedia database,execute:

$psqlwikipedia

Theexamplesinthissectionuseatablecalled articles inthe wikipedia database.Thetableiscreatedinthedefault public schemawiththefollowingstatement.

©CopyrightPivotalSoftware,Inc,2013-2017 24 2.1.2

Page 25: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

CREATETABLEarticles(idint8primarykey,date_timetimestamptz,titletext,contenttext,refstext)DISTRIBUTEDBY(id);

CreateanemptyGPTextindexAGPTextindexisanApacheSolrcollectioncontainingdocumentsaddedfromaGreenplumDatabasetable.TherecanbeoneGPTextindexperGreenplumDatabasetable.EachrowinthedatabasetableisadocumentthatcanbeaddedtotheGPTextindex.

Ifthedatabasetableispartitioned,thereisoneGPTextindexforallpartitions.Youmustspecifytheroottablenamewhencreatingtheindexandaddingdocumentstoit.GPTextprovidessearchsemanticsthatenablesearchingpartitionsefficiently.

The gptext.create_index() functioncreatesanewGPTextindex.Thisfunctionhastwosignatures:

gptext.create_index(<schema_name>,<table_name>,<id_col_name>,<def_search_col_name>[,<if_check_id_uniqueness>])

or

gptext.create_index(<schema_name>,<table_name>,<p_columns>,<p_types>,<id_col_name>,<def_search_col_name>[,<if_check_id_uniquess>])

The <schema_name> and <table_name> argumentsspecifythedatabasetablethatcontainsthesourcedocuments.

The <id_col_name> argumentisthenameofthetablecolumnthatcontainsauniqueidentifierforeachrow.The <id_col_name> columncanbeoftypeint4 , int8 , varchar , text ,or uuid .

The <def_search_col_name> argumentisthenameofthetablecolumnthatcontainsthecontentyouwanttosearchbydefault.Forexample,ifyouwanttoindexandsearchjustthe <content> column,youcanusethefirstsignatureandspecifythe content columnnameinthe <def_search_col_name> argument:

SELECT*FROMgptext.create_index('public','articles','id','content');

Thefinal,optionalargument, <if_check_id_uniqueness> ,isaBooleanargument.Whentrue,thedefault,attemptingtoaddadocumentwithanidthatalreadyexistsintheindexgeneratesanerror.Ifyousettheargumenttofalse,youcanadddocumentswiththesameid,butwhenyousearchtheindexalldocumentswiththesameIDarereturned.

Ifyouwanttoindexandsearchmultiplecolumns,youcanusethesecond gptext.create_index() signature,specifyingthecolumnstoindexinthe<p_columns> argumentandthedatatypesofthosecolumnsinthe <p_types> argument.The <p_columns> and <p_types> argumentsaretextarrays.Theidcolumnanddefaultsearchcolumnmustbeincludedinthearrays.

ThisexamplecreatesanindexyoucanuseyoutorunGPTextsearchesonthe title , content ,and refs columns.

SELECT*FROMgptext.create_index('public','articles','{id,title,content,refs}','{long,text,text,text}','id','content',true);

Notethattheidcolumnanddefaultsearchcolumnarestillspecifiedinseparateargumentsfollowingthearrays.

Youcandownloadtestdataforthistablehere .Right-clickthelinkandsavethefileandthencopyittothegpadminuser’shomedirectory.Loadthedataintothe articles tableusingthe psql\COPY metacommand:

\COPY public.articles(id, date_time, title, content, refs) FROM '/home/gpadmin/articles.csv' HEADER CSV;

Thedatacontains23articlesfromWikipedia.org.

©CopyrightPivotalSoftware,Inc,2013-2017 25 2.1.2

Page 26: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

=#SELECT*FROMgptext.create_index('public','articles','{id,title,content,refs}',id,'{long,text,text,text}','content',true);INFO:Createdindexwikipedia.public.articlescreate_index--------------t(1row)

Toverifytheindexiscreated,enterthisquery:

=#SELECT*FROMgptext.index_status('wikipedia.public.articles');content_id|index_name|shard_name|shard_state|replica_name|replica_state|core|node_name|base_url|is_leader|partitioned------------+---------------------------+------------+-------------+--------------+---------------+-------------------------------------------+------------------------------+-------------------------------------+-----------+-------------0|wikipedia.public.articles|shard0|active|core_node2|active|wikipedia.public.articles_shard0_replica1|gpsne.localdomain:18984_solr|http://gpsne.localdomain:18984/solr|t|f0|wikipedia.public.articles|shard0|active|core_node4|active|wikipedia.public.articles_shard0_replica2|gpsne.localdomain:18983_solr|http://gpsne.localdomain:18983/solr|f|f1|wikipedia.public.articles|shard1|active|core_node1|active|wikipedia.public.articles_shard1_replica2|gpsne.localdomain:18983_solr|http://gpsne.localdomain:18983/solr|f|f1|wikipedia.public.articles|shard1|active|core_node3|active|wikipedia.public.articles_shard1_replica1|gpsne.localdomain:18984_solr|http://gpsne.localdomain:18984/solr|t|f(4rows)

ThisexampleexecutedonaGreenplumDatabaseclusterwithtwoprimarysegments.Twoshardswerecreated,oneforeachsegment,andeachshardhastworeplicas.Thereplicasarenamedcore_node1throughcore_node4.

Youcanalsorunthe gptext-state-D

command-lineutilitytoverifytheindexwascreated.Seethegptext-statereferencefordetails.

PopulatetheindexTopopulatetheindex,usethetablefunction gptext.index() ,whichhasthefollowingsyntax:

SELECT*FROMgptext.index(TABLE(SELECT*FROM<table_name>),<index_name>);

Forexample:

SELECT*FROMgptext.index(TABLE(SELECT*FROMarticles),'wikipedia.public.articles');dbid|num_docs------+----------3|112|12(2rows)

Theresultsofthiscommandshowthat23documentsfromtwosegmentswereaddedtotheindex.

Thefirstargumentofthe gptext.index() functionisa“table-valuedexpression.” TABLE(SELECT*FROMarticles) createsa“table-valuedexpression”fromthearticlestable,usingthetablefunction TABLE .

Youcanchoosethedatatoindexorupdatebychangingtheinnerselectlistinthequerytoselecttherowsyouwanttoindex.Whenaddingnewdocumentstoanexistingindex,forexample,specifya WHERE clauseinthe gptext.index() calltochooseonlythenewrowstoindex.

Theinner SELECT statementcouldalsobeaqueryonadifferenttablewiththesamestructure,oraresultsetconstructedwithanarbitrarilycomplexjoin,providedthecolumnsspecifiedinthe gptext.create_index() functionarepresentintheresults.Ifyouindexdatafromasourceotherthanthetableusedtocreatetheindex,besurethedistributionkeyfortheresultsetmatchesthedistributionkeyofthebasetable.TheGreenplumDatabase SELECT

statementhasa SCATTERBY clausethatyoucanusetospecifythedistributionkeyfortheresultsfromaquery.SeeSpecifyingadistributionkeywithSCATTERBYformoreaboutthedistributionpolicyandGPTextindexes.

Committheindex

©CopyrightPivotalSoftware,Inc,2013-2017 26 2.1.2

Page 27: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Afteryoucreateandpopulateanindex,youcommittheindexusing gptext.commit_index(<index_name>) .

Thisexamplecommitsthedocumentsaddedtotheindexinthepreviousexample.

=#SELECT*FROMgptext.commit_index('wikipedia.public.articles');commit_index--------------t(1row)

Anynewdataaddedtotheindexsincethelastcommitiscommittedtotheindex.

ManagingGPTextIndexesGPTextprovidescommand-lineutilitiesandfunctionsyoucanusetoperformtheseGPTextmanagementtasks:

Configuringanindex

Optimizinganindex

SpecifyingadistributionpolicywithSCATTERBY

Deletingfromanindex

Droppinganindex

Addingafieldtoanindex

Droppingafieldfromanindex

Listingallindexes

ConfiguringanindexYoucanmodifyyourindexingbehaviorgloballybyusingthe gptext-config utilitytoeditasetofindexconfigurationfiles.Thefilesyoucaneditwithgptext-config are:

solrconfig.xml –ContainsmostoftheparametersforconfiguringSolritself(seehttp://wiki.apache.org/solr/SolrConfigXml ).

managed-schema –DefinestheanalysischainsthatSolrusesforvariousdifferenttypesofsearchfields(seeTextAnalysisChains).

stopwords.txt –Listswordsyouwanttoeliminatefromthefinalindex.

protwords.txt –Listsprotectedwordsthatyoudonotwanttobemodifiedbytheanalysischain.Forexample,iPhone.

synonyms.txt –Listswordsthatyouwantreplacedbysynonymsintheanalysischain.

elevate.xml –Movesspecificwordstothetopofyourfinalindex.

emoticons.txt –Definesemoticonsforthe text_sm socialmediaanalysischain.(seeTheemoticons.txtfile).

Youcanalsouse gptext-config tomovefiles.

OptimizinganindexThefunction gptext.optimize_index(<index_name>,<max_segments>) mergesallsegmentsintoasmallnumberofsegments( <max_segments> )forincreasedefficiency.

Example:

=#SELECT*FROMgptext.optimize_index('wikipedia.public.articles',10);optimize_index----------------t(1row)

SpecifyingadistributionpolicywithSCATTERBY

©CopyrightPivotalSoftware,Inc,2013-2017 27 2.1.2

Page 28: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Thefirstparameterof gptext.index() isatable-valuedexpression,suchas TABLE(SELECT*FROMarticles) .ThequeryinthisparametermusthavethesamedistributionpolicyasthetableyouareindexingsothatdocumentsaddedtotheindexareassociatedwiththecorrectGreenplumDatabasesegments.Somequeries,however,havenodistributionpolicyortheyhaveadifferentdistributionpolicy.Thiscouldhappenifthequeryisajoinoftwoormoretablesoraqueryonanintermediate(staging)tablethatisdistributeddifferentlythanthebasetablefortheindex.

Tospecifyadistributionpolicyforaqueryresultset,theGreenplumDatabaseSELECTstatementhasa“SCATTERBY”clause.

TABLE(SELECT*FROMarticlesSCATTERBY<distrib_id>)

where distrib_id isthesamedistributionkeyusedtodistributethebasetablefortheindex.

DeletingfromanindexYoucandeletefromanindexusingaquerywiththefunction gptext.delete(<index_name>,<query>) .Thisdeletesfromtheindexalldocumentsthatmatchthesearchquery.Todeletealldocuments,usethequery '*' .

Afterasuccessfuldeletion,execute gptext.commit_index(<index_name>) tocommitthechange.

Thisexampledeletesalldocumentscontaining "toxin" inthedefaultsearchfield.

=#SELECT*FROMgptext.delete('wikipedia.public.articles','toxin');delete--------t(1row)

SELECT*FROMgptext.commit_index('wikipedia.public.articles');

Examplethatdeletesalldocumentsfromtheindex:

SELECT*FROMgptext.delete('wikipedia.public.articles','*:*');

SELECT*FROMgptext.commit_index('wikipedia.public.articles');

DroppinganindexYoucancompletelyremoveanindexwiththe gptext.drop_index(<index_name>) function.

Example:

SELECT*FROMgptext.drop_index('wikipedia.public.articles');

AddingafieldtoanindexYoucanaddafieldtoanexistingindexusingthe gptext.add_field() function.Forexample,youcanaddafieldtotheindexafteracolumnisaddedtotheunderlyingdatabasetableoryoucanaddafieldtoindexacolumnthatwasnotspecifiedwhentheindexwascreated.

GPTextmapstheGreenplumDatabasefieldtypetoanequivalentSolrdatatypeautomatically.SeeGPTextandSolrDataTypeMappingsforatableofdatatypemappings.

©CopyrightPivotalSoftware,Inc,2013-2017 28 2.1.2

Page 29: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

CREATETABLEarticles(idint8primarykey,date_timetimestamptz,titletext,contenttext,refstext)DISTRIBUTEDBY(id);

SELECT*FROMgptext.create_index('public','articles','id','content',true);...populatetheindex...SELECT*FROMgptext.commit_index('wikipedia.public.articles');

ALTERTABLEarticlesADDnotestext;SELECT*FROMgptext.add_field('wikipedia.public.articles','notes',false,false);SELECT*FROMgptext.reload_index('wikipedia.public.articles');

AddingafieldtoaGPTextindexrequiresthebasetabletobeavailable.Ifyoudropthetableaftercreatingtheindex,youcannotaddfieldstotheindex.

DroppingafieldfromanindexYoucandropafieldfromanexistingindexwiththe gptext.drop_field() function.Afteryouhavedroppedfields,call gptext.reload_index() toreloadtheindex.

Example:

SELECT*FROMgptext.drop_field('wikipedia.public.articles','notes');SELECT*FROMgptext.reload_index('wikipedia.public.articles');

ListingallindexesYoucanlistallindexesintheGPTextclusterusingthe gptext-state command-lineutility.Forexample:

$gptext-state-D20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-ExecuteGPTextstate...20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-Checkzookeeperclusterstate...20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-CheckGPTextclusterstatus...20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-CurrentGPTextVersion:2.1.220170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-Allnodesareupandrunning.20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-Indexstatedetails.20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-databaseindexnamestate20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-wikipediawikipedia.public.articlesGreen20170822:10:11:28:029752gptext-state:gpsne:gpadmin-[INFO]:-Done.

StoringFieldContentinanIndexSolrcanstorethecontentsofcolumnsintheindexsothatresultsofasearchontheindexcanincludethecolumncontents.Thismakesitunnecessarytojointhesearchqueryresultswiththeoriginaltable.Youcanevenstorethecontentsofdatabasecolumnsthatarenotindexedandreturnthatcontentwithsearchresults.GPTextpackstheadditionalfieldsinabufferaddedtothesearchresults.Individualfieldscanberetrievedfromthisbufferusingthegptext.gptext_retrieve_field() , gptext.gptext_retrieve_field_int() ,and gptext.gptext_retrieve_field_float() functions.

OneGPTextdesignpatternistostoreallofatable’scolumnsinaGPTextindex.Allofthetable’scontentcanberetrievedwithGPTextqueries,sothedatabasetablecanthenbetruncatedordroppedtoeliminateredundantdiskstorage.AdditionaldocumentscanbeaddedtotheGPTextindexlaterbyinsertingthemintothetruncatedtable,orintoatemporarytablewiththesamestructure,andthenaddingthemtotheindexwiththe gptext.index()function.

Entriesforeachfieldinthe managed-schema fileforaGPTextindexdetermineifthefieldisindexedorstored.Youcaneditthe managed-schema filetoadd,remove,ormodifytheentriesforeachfield.

Followthesestepstoconfigureanindextostoreadditionalfields.

1. Use gptext-config toeditthe managed-schema file.

gptext-config-iwikipedia.public.articles-fmanaged-schema

©CopyrightPivotalSoftware,Inc,2013-2017 29 2.1.2

Page 30: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

2. Findthe <field> elementsforthecolumnsyouwanttostoreintheindex.Notethat <field> elementswithnamesbeginningwithanunderscoreareinternalfieldsandshouldnotbemodified.Thefieldsnamed“id”and“content”inthisexamplecoorespondtothe <id_col> and<def_search_col_name> columnsinthe gptext.create_index() function.

<fieldname="__pk"type="long"indexed="true"stored="true"/><fieldname="__temp_field"type="text"multiValued="true"indexed="true"stored="false"/><fieldname="_version_"type="long"indexed="true"stored="true"/><fieldname="content"type="text"indexed="true"stored="false"/><fieldname="id"type="long"indexed="true"stored="true"/><fieldname="refs"type="text"indexed="true"stored="false"/><fieldname="title"type="text"indexed="true"stored="false"/>

3. Addanew <field> elementforanyadditionaltablecolumnyouwanttostoreintheGPTextindex.Setthe type attributeoftheelementtoatypethatmatchesthedatabasecolumntype.(Thevalidtypesaredefinedby <fieldType> elementsinthesame managed-schema file.)

4. Foreachfieldyouwanttostoreintheindex,changethe stored attributefrom "false" to "true" .

<fieldname="title"stored="true"type="text"indexed="true"/><fieldname="content"stored="true"type="text"indexed="true"/><fieldname="refs"stored="true"type="text"indexed="true"/>```

5. Savethefileand,ifanydocumentswerealreadyaddedtotheindex,reindexthetable.SeeRetrievingAdditionalStoredFieldsforinformationaboutaddingadditionalstoredfieldstotheresultsofaGPTextquery.

Formoreaboutthecontentsofthe managed-schema fileandadditionalwaystocustomizeGPTextindexesseeCustomizingGPTextIndexes.

CreatingaGPTextindexforaGreenplumDatabasepartitionedtableWhenyoucreateaGPTextindexforapartitionedGreenplumDatabasetable,GPTextcreatesanindexwitha __partition field.Whenyouadddocumentstotheindex,GPTextsavesthechildpartitiontablenameinthe __partition field,allowingyoutocreateGPTextqueriesthatsearchandfilterbypartition.

UnlikeGreenplumDatabase,whichmanageschildpartitionsinseparatedatabasetables,GPTextdoesnotcreateaseparateSolrcollectionforeachdatabasepartitionbecausethelargernumberofSolrcorescouldadverselyaffectthecapacityandperformanceoftheSolrcluster.

CreatingaGPTextindexforapartitionedtableandaddingdocumentstotheindexisnodifferentthancreatinganindexforanunpartitionedtable.

Thefollowing CREATETABLE statementcreatesa tweets tablepartitionedbymonth.

CREATETABLEtweets(idbigint,createdtimestamp,is_retweetboolean,retweetsint,tweettext)DISTRIBUTEDBY(id)PARTITIONBYRANGE(created)(START(DATE'2009-01-01')INCLUSIVEEND(DATE'2018-01-01')EXCLUSIVEEVERY(INTERVAL'1month'));

ThisstatementcreatesaGPTextindexforthe tweets table.

SELECT*FROMgptext.create_index('public','tweets','id','tweet',true);

Notethatyoumustsupplythenameoftherootpartition, tweets inthisexample,orthe gptext.create_index() functionwillissueanerrormessage.

Thiscommandaddsrowsfromthe tweets tabletotheGPTextindex.

SELECT*FROMgptext.index(TABLE(SELECT*FROMtweets),'demo.public.tweets');

Committhechangestotheindex:

SELECT*FROMgptext.commit_index('demo.public.tweets');

SeeSearchingPartitionedTablesforexamplesofsearchingpartitionedtables.

©CopyrightPivotalSoftware,Inc,2013-2017 30 2.1.2

Page 31: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

QueryingGPTextIndexesToretrievedata,yousubmitaquerythatperformsasearchbasedoncriteriathatyouspecify.Simplequeriesreturnstraight-forwardresults.Youcanusethedefaultqueryparser,orspecifyadifferentqueryparseratquerytime.

CreatingaSimpleSearchQueryAfteraSolrindexiscommitted,youcanrunquerieswiththe gptext.search() function,whichhasthissyntax:

gptext.search(<src_table>,<index_name>,<search_query>,<filter_queries>[,<options>])

ANDsearchexamplewithtop5resultsThissearchfindsdocumentsthatcontainbothsearchterms“solar”and“battery”.The 'rows=5' argumentisaSolroptionthatspecifiesthetop5resultsaretobereturnedfromeachsegment.InaGreenplumDatabaseclusterwithtwosegments,thisqueryreturnsupto10rows.

SELECTa.id,a.date_time,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','solarANDbattery',null,'rows=5')qWHEREq.id=a.idORDERBYscoreDESC;id|date_time|title|score----------+------------------------+---------------------+-----------13690575|2017-08-2402:34:00-05|Solarpower|2.71286582008322|2017-08-0502:09:00-05|Vehicle-to-grid|2.58101534711003|2017-08-1018:56:00-05|Osmoticpower|2.207300725784|2017-08-2607:10:00-05|Renewableenergy|2.1295567213555|2017-08-2712:48:00-05|Solarupdrafttower|2.021064827743|2017-08-2015:56:00-05|Solarenergy|1.6916461608623|2017-08-2703:56:00-05|Ethanolfuel|1.4619896(7rows)

SeeSolroptionsformoreaboutSolroptions.

ORsearchexamplewithtop5resultsByusingtheORkeyword,thissearchmatchesmoredocumentsthantheANDexample.Thetotalnumberofrowsreturnedislimitedbythe rows=5 Solroption.

SELECTa.id,a.date_time,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','solarORbattery',null,'rows=5')qWHEREq.id=a.idORDERBYscoreDESC;id|date_time|title|score---------+------------------------+---------------------+-----------2008322|2017-08-0502:09:00-05|Vehicle-to-grid|2.581015325784|2017-08-2607:10:00-05|Renewableenergy|2.12955672120798|2017-01-2800:59:00-06|Lithiumeconomy|2.0416002213555|2017-08-2712:48:00-05|Solarupdrafttower|2.021064827743|2017-08-2015:56:00-05|Solarenergy|1.6916461608623|2017-08-2703:56:00-05|Ethanolfuel|1.4619896533423|2017-08-2800:52:00-05|Solarwaterheating|1.02390722988035|2017-03-1206:39:00-05|Vortexengine|0.9519546113728|2017-08-1509:59:00-05|Geothermalenergy|0.680103555017|2017-08-2819:24:00-05|Fusionpower|0.6432224(10rows)

Searchnon-defaultfields

©CopyrightPivotalSoftware,Inc,2013-2017 31 2.1.2

Page 32: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

AGPTextindexhasadefaultsearchcolumn,specifiedwhentheindexiscreatedwiththe gptext.create_index() function.Ifyouhaveincludedadditionalcolumnstoindex,youcanreferencetheminyourqueries.Thisquerysearchesfordocumentswiththeword“solar”inthe title column.

SELECTa.id,a.date_time,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','title:"solar"',null,null)qWHEREq.id=a.idORDERBYscoreDESC;id|date_time|title|score----------+------------------------+---------------------+-----------13690575|2017-08-2402:34:00-05|Solarpower|1.654772927743|2017-08-2015:56:00-05|Solarenergy|1.6547729533423|2017-08-2800:52:00-05|Solarwaterheating|1.1132113213555|2017-08-2712:48:00-05|Solarupdrafttower|1.1132113(4rows)

Thisexamplesearchesforarticlesthathave“photosynthesis”inthe content columnbutthatdonothave“solar”inthe title column.

SELECTa.id,a.date_time,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','photosynthesisand-title:solar',null,null)qWHEREq.id=a.idORDERBYscoreDESC;id|date_time|title|score----------+------------------------+------------------+-----------25784|2017-08-2607:10:00-05|Renewableenergy|2.972095553716476|2017-08-2820:40:00-05|Seaweedfuel|1.424022114205946|2017-08-2808:46:00-05|Algaefuel|1.3022419608623|2017-08-2703:56:00-05|Ethanolfuel|0.7614042(4rows)

FilteringsearchresultsAfilterqueryappliesafiltertotheresultsreturnedbythequery.The <filter_queries> argumentofthe gptext.search() functionisanarray,soyoucanapplymultiplefilterstothesearchresults.

Thefollowingexamplefindsarticlesthathavetheword“nuclear”inthe content columnandthenappliestwofilterqueriestoremovearticlesthathave“solar”inthe title columnandarticlesthatdonothave“power”inthe title column.

SELECTa.id,a.date_time,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','nuclear','{-title:solar,title:power}',null)qWHEREq.id=a.idORDERBYscoreDESC;id|date_time|title|score----------+------------------------+------------------+------------14090587|2017-08-1414:00:00-05|Low-carbonpower|1.189789755017|2017-08-2819:24:00-05|Fusionpower|1.175360913021878|2017-08-0905:03:00-05|Geothermalpower|0.99499804(3rows)

CreatingaFacetedSearchQueryFacetingbreaksqueryresultsintomultiplecategoriesandshowscountsforeachcategory.

Withthe faceted_field_search() function,thecategoriesarefieldnames.Hereisthesyntaxforthe faceted_field_search() function:

faceted_field_search(index_name,query,filter_queries,facet_fields,facet_limit,minimum)

©CopyrightPivotalSoftware,Inc,2013-2017 32 2.1.2

Page 33: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Inthisexample,thequeryrunsonasetofsocialmediafeedsandeliminatesspamandtruncatedfields:

SELECT*FROMgptext.faceted_field_search('twitter.public.message','*:*',null,'{spam,truncated}',0,0)

Inthisexample,thequerysearchesalltweetsinthedatasetbyusers( author_id s)whohavecreatedatleastfivetweets:

SELECT*FROMgptext.faceted_field_search('twitter.public.message','*:*',null,'{author_id}',-1,5);

HighlightingSearchTermsinQueryResultsThe gptext.highlight() functionhighlightssearchtermsbyinsertingmarkuptagsbeforeandaftereachoccurrenceoftheterminresultsreturnedfromagptext.search() query.Forexample,ifthesearchtermis“iphone”,eachoccurrenceof“iphone”inthedataismarkedup:

<em>iphone</em>

Youcanchangethedefaultmarkupstringsfrom <em> and </em> bysettingthe gptext.hl_pre_tag and gptext.hl_post_tag serverconfigurationparameters.

Theindexmusthavebeencreatedwithtermsenabled.Use gptext.enable_terms() toenabletermvectorsandthereindexthedataifitwasalreadyindexed.See gptext.enable_terms() inthePivotalGPTextUser’sGuide.

Important:Highlightingaddsoverheadtothequery,includingindexspace,indexingtime,andsearchtime.

The gptext.highlight() syntaxis:

gptext.highlight(column_data,column_name,offsets)

The column_data parametercontainsthetextdatathatwillbemarkedupwithhighlightingtags.

The column_name parameteristhenameofthecorrespondingtablecolumn.

The offsets parameterisa gptexthstore typethatcontainskey-valuepairsthatspecifythelocationsofthesearchterminthetextdata.Thisvalueisconstructedbythe gptext.search() functionwhenhighlightingisenabled.Thekeycontainsthecolumnnameandthevalueisacomma-separatedlistofoffsetswherethedataappears.

Toenablehighlightinginagptext.search()query,addthe hl and hl.fl options:

hl=true&hl.fl=field1,field2

Settingthe hl=true optionenableshighlightingforthesearch.The hl.fl optionspecifiesalistofthefieldnamestohighlight.

Example-Top10results,withsearchtermshighlighted

SELECTt.id,gptext.highlight(t.message,'message',s.hs)FROMtwitter.messaget,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message','{!gptextqp}iphone',null,'rows=10&hl=true&hl.fl=message')sWHEREt.id=s.id::int8;

SearchingPartitionedTablesAGPTextindexforapartitionedGreenplumDatabasetablehasa __partition fieldthatcontainsthenameofthechildpartition.Whenyouquerytheindex,youcanusethe __partition fieldtorestrictthepartitionstosearch.

©CopyrightPivotalSoftware,Inc,2013-2017 33 2.1.2

Page 34: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Searchallpartitionsinanindexbycalling gptext.search() withtherootpartitionname:

SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'demo.public.tweets','birthday',null);

Youcansearchasinglepartitionbycalling gptext.search() withthechildpartitionname.Usethe gptext.index_status(<index_name>) functiontoseethepartitionnames.Forexample:

SELECT*FROMgptext.partition_status('demo.public.tweets');partition_name|inherits_name|level------------------------------+--------------------+-------demo.public.tweets_1_prt_1|demo.public.tweets|1demo.public.tweets_1_prt_2|demo.public.tweets|1demo.public.tweets_1_prt_3|demo.public.tweets|1demo.public.tweets_1_prt_4|demo.public.tweets|1demo.public.tweets_1_prt_5|demo.public.tweets|1demo.public.tweets_1_prt_6|demo.public.tweets|1demo.public.tweets_1_prt_7|demo.public.tweets|1demo.public.tweets_1_prt_8|demo.public.tweets|1demo.public.tweets_1_prt_9|demo.public.tweets|1demo.public.tweets_1_prt_10|demo.public.tweets|1...

Thisexamplespecifiesasinglepartitiontosearch:

SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'demo.public.tweets_1_prt_6','birthday',null);

Youcanalsospecifyapartitionnameorarangeofpartitionsinthequeryfilterargumentofthe gptext.search() function.Thisexamplesearchesthetenpartitionsbetween tweets_1_prt_11 and tweets_1_prt_20 .

SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'demo.public.tweets','birthday','{''[__partition:[tweets_1_prt_11TOtweets_1_prt_20]''}');

RetrievingAdditionalStoredFieldsAGPTextindexcanbeconfiguredtostoreadditionalfieldswhendocumentsareindexedwiththe gptext.index() function.TheadditionalstoredfieldscanbereturnedwithsearchresultssothatitisunnecessarytojoinGPTextsearchresultswiththeoriginaltableintheGreenplumDatabase.

SeeStoringAdditionalFieldsinanIndexforinstructionstoconfiguretheindextostoretheadditionalfields.

RetrievetheadditionalfieldvaluesinaGPTextsearchbyspecifyingthelistoffieldsinthe gptext.search() optionsargument.Inthisexample,thewikipedia.public.articles indexisconfiguredtostorethefields id , date_time ,and title .

SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','+grid+economy',null,'fl=id,date_time,title');id|score|hs|rf----------+-----------+----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------13690575|2.1322436||column_value{name:"id"value:"13690575"}column_value{name:"date_time"value:"2017-08-24T08:34:00Z"}column_value{name:"title"value:"Solarpower"}27743|1.9176841||column_value{name:"id"value:"27743"}column_value{name:"date_time"value:"2017-08-20T21:56:00Z"}column_value{name:"title"value:"Solarenergy"}113728|1.3315439||column_value{name:"id"value:"113728"}column_value{name:"date_time"value:"2017-08-15T15:59:00Z"}column_value{name:"title"value:"Geothermalenergy"}14205946|1.0533204||column_value{name:"id"value:"14205946"}column_value{name:"date_time"value:"2017-08-28T14:46:00Z"}column_value{name:"title"value:"Algaefuel"}533423|2.3232973||column_value{name:"id"value:"533423"}column_value{name:"date_time"value:"2017-08-28T06:52:00Z"}column_value{name:"title"value:"Solarwaterheating"}25784|2.2123294||column_value{name:"id"value:"25784"}column_value{name:"date_time"value:"2017-08-26T13:10:00Z"}column_value{name:"title"value:"Renewableenergy"}7906908|1.9307073||column_value{name:"id"value:"7906908"}column_value{name:"date_time"value:"2017-08-28T06:00:00Z"}column_value{name:"title"value:"Biomass"}(7rows)

©CopyrightPivotalSoftware,Inc,2013-2017 34 2.1.2

Page 35: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Toretrieveallfieldsstoredintheindex,usethe * wildcardforthefieldlist: 'fl=*' .

Intheresults,therequestedfieldsarepackedintoanextrafieldnamed rf withtheformat:

column_value{name:"<field1_name>"value:"<field1_value>"}[column_value{name:"<field2_name>"value:"<field2_value>"}]...

TheGPTextfunction gptext.gptext_retrieve_field(rf,<column_name>) retrievesasinglefieldvaluefromthisstructureasatextvalue.Therearevariationstoretrievethefieldvaluesas int or float values.Ifthespecifiedfieldnamedoesnotexistinthe rf structure,thefunctionreturns NULL .Thisexampleshowshowtousethesefunctionstounpackthesearchresultsintoseparatecolumns.

SELECTid,score,gptext.gptext_retrieve_field_int(rf,'id')id,to_timestamp(gptext.gptext_retrieve_field(rf,'date_time'),'YYYY-MM-DDHH24:MI:SS')date,gptext.gptext_retrieve_field(rf,'title')titleFROMgptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','+grid+economy',null,'fl=*');id|score|id|date|title----------+-----------+----------+------------------------+---------------------533423|2.3232973|533423|2017-08-2806:52:00-05|Solarwaterheating25784|2.2123294|25784|2017-08-2613:10:00-05|Renewableenergy7906908|1.9307073|7906908|2017-08-2806:00:00-05|Biomass13690575|2.1322436|13690575|2017-08-2408:34:00-05|Solarpower27743|1.9176841|27743|2017-08-2021:56:00-05|Solarenergy113728|1.3315439|113728|2017-08-1515:59:00-05|Geothermalenergy14205946|1.0533204|14205946|2017-08-2814:46:00-05|Algaefuel(7rows)

UsingAdvancedQueryingOptionsWhenyousubmitaquery,Solrprocessesthequeryusingaqueryparser.ThereareseveralSolrqueryparserswithdifferentcapabilities.Forexample,theComplexPhraseQueryParser canparsewildcards,andthe SurroundQueryParser supportsspanqueries:findingwordsinthevicinityofasearchwordinadocument.

Youcanusethemostappropriateparserforyourquery(seeChangingtheQueryParseratQueryTime).

GPTextsupportsthesequeryparsers:

1. QParserPlugin ,thedefaultGPTextqueryparser. QParserPlugin isasupersetofthe LuceneQParserPlugin ,Solr’snativeLucenequeryparser.QParserPlugin isageneralpurposequeryparserwithbroadcapabilities. QParserPlugin doesnotsupportspanqueriesandhandlesoperatorprecedenceinanunintuitivemanner.Thesupportforfieldselectionisalsoratherweak.Seehttp://wiki.apache.org/solr/SolrQuerySyntax .

2. The ComplexPhraseQueryParser ,whichsupportswildcards,ORs,ranges,andfuzziesinsidephrasequeries.Seehttps://issues.apache.org/jira/browse/SOLR-1604 .

3. The SurroundQueryParser ,whichsupportsthefamilyofspanqueries.Seehttp://wiki.apache.org/solr/SurroundQueryParser .

4. The DisMax (or eDisMax )QueryParser,whichhandlesoperatorprecedenceinanintuitivemannerandisbestsuitedforuserqueries.Seehttp://wiki.apache.org/solr/DisMaxQParserPlugin .

5. TheUnifiedQueryParsercanuseallthequeryparsersincombination.SeeUsingtheUniversalQueryParserformoreinformation.

Agoodgeneralreferenceforqueryparserscanbefoundat:http://www.lucidimagination.com/blog/2009/02/22/exploring-query-parsers .

ChangingtheQueryParseratQueryTimeYoucanchangethequeryparseratquerytimewiththe defType Solroptioninthe gptext.search() functionthatsupportsSolroptions.Forexample,thisqueryusesthe dismax queryparsertoreturnthetop100resultsfor“olympics”,withnofilterquery,andusingthedefaultsearchfield:

Top100results,nofilterquery,defaultsearchfieldused:

©CopyrightPivotalSoftware,Inc,2013-2017 35 2.1.2

Page 36: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

SELECTw.title,q.scoreFROMwikipediaw,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','olympics',null,'rows=100&defType=dismax')qWHEREw.id=q.id::int8;

The options parameterincludes‘defType=dismax’.

InvoketheUniversalQueryParserbyinserting {!gptextqp}<searchquery> inthequeryparameterofthe gptext.search() function.

UsingtheSolrlocalParamssyntaxYoucanusetheSolr localParams syntaxwithallGPTextsearchfunctionstochangethequeryparseratquerytimebyreplacingthe <query> termwith'{!type=dismax}<query>' .

Note:Thedefaultqueryparserisspecifiedinthe requestHandler definitionsin solrconfig.xml .Youcanedit solrconfig.xml withthemanagementutilitygptext-config .

UsingtheUniversalQueryParserWiththeUniversalQueryParser,youcanperformsearchesusingfeaturesofanyofthequeryparsersinonesearchstring.InvoketheUniversalQueryParserwiththe gptext.search() function’s search_query parameterinthisformat:

'{!gptextqp}search_query'

Example-ThefollowingqueryillustratesusingtheUniversalQueryParser.

SELECTw.title,q.scoreFROMwikipediaw,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','{!gptextqp}mobil*2W(ObamaORClinton)',null)qWHEREw.id=q.id::int8;

Thesearchqueryinthisexampleincludessyntaxforthreequeryparsers:

mobil* –ComplexQuerywithwildcard

2W –ProximityQuery,requestingamaximumoftwowordsproximitybetweenthephrases

Obama OR Clinton –LuceneQuery

©CopyrightPivotalSoftware,Inc,2013-2017 36 2.1.2

Page 37: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

CustomizingGPTextIndexesWhenyoucreateaGPTextindex,Solrcreatesasetofconfigurationfilesfortheindex.The managed-schema filecontainsdefinitionsforthefields,fieldtypes,andtextanalyzersthatdefinethecontentsandbehavioraGPTextindex.

AfieldmapsaGreenplumDatabasetablecolumnstoafieldintheGPTextindex.

AfieldtypeassignsSolrclassesthathandleadatatypetoafield.

AnanalyzerchainisasetofJavaclassesthattokenizeandfilterthecontentofafieldtobeindexed.

Thistopicprovidesanoverviewofthecontentsofthe managed-schema fileandtherelationshipsbetweentheXMLelementsthatdefinefields,fieldtypes,ananalyzerchains.Byeditingthe managed-schema file,youcanspecifyatthefieldlevelhowGreenplumDatabasedataisstoredandindexed.

Fordetaileddocumentation,refertothecommentsinthe managed-schema fileortotheApacheSolrClouddocumentation.

Editingthemanaged-schemaFileYoucaneditconfigurationfilesusingthe gptext-config utility.Thefollowingcommandloadsthe managed-schema fileforanindexintoaneditor:

$gptext-config-i<index-name>-fmanaged-schema

FieldElementsGPTextadds field elementstothe managed-schema fileforcolumnsincludedwhentheindexwascreatedwiththe gptext.create_index() function.Thisexampleisthedefinitionforatextfieldnamed description :

<fieldname="description"stored="false"type="text_intl"indexed="true"/>

The name attributeisthenameofthedatabasecolumn.IfthecolumnnameisnotavalidSolrfieldname,itisalteredtoconform.

The stored attributedeterminesifthecontentofthefieldwillbestoredintheindex.Ifthefieldisstoredintheindex,GPTextsearchresultscanreturnthecontentofthefield.Iftheattributeisnotstored,retrievingthefieldcontentrequiresaSQLjoin.

The type attributemapstheGreenplumDatabasetypetoaSolrtype,definedinthesamefilewitha <fieldType> element.

The indexed attributedetermineswhetherthefieldcontentwillbeindexed.

The <field> elementcanhaveadditionalattributesusedwithsometypes.Seethecommentafterthe <fields> elementforacompletelistofattributes.

FieldTypesThe type attributeofthe <field> elementdetermineshowSolrstoresthefieldinindexes.The class attributemapsthefieldtypetoaSolrJavaclassthatrecognizesandprocessesthedatatype.Solrincludesmanybasefieldtypes.SeeGPTextandSolrDataTypeMappingsforamappingofSolrtypestoGreenplumDatabasetypes.Ifyouhaveacustomtype,youcanaddnewfieldtypesbyimplementingSolrJavatypeinterfaces,oryoucanspecifyanexistingbasetypeandcustomizeitwithananalyzerchain,asdescribedinthenextsection.

AnalyzerChainsTextanalysischainsdeterminehowSolrindexestextfieldsinadocument.AnanalyzerchainisasetofJavaclassesthattokenizeandfilterthecontentbeforeitisaddedtotheindex.Differentanalysischainscanbedefinedforindexingandqueryingoperations.

Fieldanalysisbeginswithatokenizerthatdividesthecontentsofafieldintotokens.InLatin-basedtextdocuments,thetokensarewords(alsocalledterms).InChinese,Japanese,andKorean(CJK)documents,thetokensarecharacters.

Thetokenizercanbefollowedbyoneormorefiltersexecutedinsuccession.Filtersrestrictthequeryresults,forexample,byremovingunnecessaryterms(“a”,“an”,“the”),convertingtermformats,orbyperformingotheractionstoensurethatonlyimportant,relevanttermsappearintheresultset.Eachfilteroperatesontheoutputofthetokenizerorfilterthatprecedesit.Solrincludesmanytokenizersandfiltersthatallowanalyzerchainstoprocessdifferentcharactersets,languages,andtransformations.SeeAnalyzers,TokenizersandFilters-Thefulllist foracomprehensivelist.

©CopyrightPivotalSoftware,Inc,2013-2017 37 2.1.2

Page 38: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Fieldtypesareassignedanalyzersinanindex’s managed-schema file.ThefollowingexampleshowstheSolr text fieldtypespecification:

<fieldTypename="text"class="solr.TextField"positionIncrementGap="100"autoGeneratePhraseQueries="true"><analyzertype="index"><tokenizerclass="solr.WhitespaceTokenizerFactory"/><!--inthisexample,wewillonlyusesynonymsatquerytime<filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="false"/>--><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="1"catenateNumbers="1"catenateAll="0"splitOnCaseChange="1"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="solr.PorterStemFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="solr.WhitespaceTokenizerFactory"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="0"catenateNumbers="0"catenateAll="0"splitOnCaseChange="1"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="solr.PorterStemFilterFactory"/></analyzer></fieldType>

Ananalyzerhasonlyonetokenizer, solr.WhitespaceTokenizerFactory inthisexample.Thetokenizercanbefollowedbyoneormorefiltersexecutedinsuccession.

Filtersrestrictthequeryresults.Eachfilteroperatesontheoutputofthetokenizerorfilterthatprecedesit.Forexample,the solr.StopFilterFactory filterremovesunnecessaryterms(“a”,“an”,“the”)fromthestreamoftokens.Thewordstofilteroutofthestreamarelistedinthe stopwords.txt configurationfile.Youcaneditthe stopwords.txt filewiththe gptext-config utilitytochangethelistofwordsexcludedfromtheindex.

Thereareseparateanalyzertypesforindexandqueryoperations.Thequeryanalyzerchaininthisexampleincludesa solr.SynonymFilterFactory thatlooksupeachtokeninafile synonyms.txt and,iffound,returnsthesynonyminplaceofthetoken.

Theanalysischaincanincludea“stemmer”, solr.PorterStemFilterFactory inthisexample.Thestemmeremploysanalgorithmtochangewordstotheir“stems”.Forexample,“confidential”,“confidentiality”,and“confidentis”areallstemmedto“confidenti”.Usingastemmercandramaticallyreducethesizeoftheindex,butusersexecutingsearchesshouldbeawarethatsomesearchexpressionswillnotworkasexpectedbecauseofstemming.Forexample,searchingwithawildcardsuchas "confidential*" willreturnnomatchesbecausethewordswerestemmedto“confidenti”duringindexing.Withoutawildcard,thewordinthesearchexpressionisalsostemmedandthereforethesearchsucceeds.

GPTextTextAnalysisChainsInadditiontothetextanalysischainsSolrprovides,GPTextprovidesthefollowingtextanalysischains:

text_intl,theInternationalTextAnalyzer

text_sm,theSocialMediaTextAnalyzer

text_intl,theInternationalTextAnalyzertext_intl isthedefaultGPTextanalyzer.Itisamultiplelanguagetextanalyzerfor text fields.IthandlesLatin-basedwordsandChinese,Japanese,andKorean(CJK)characters.

text_intl processesdocumentsasfollows.

1. SeparatesCJKcharactersfromotherlanguagetext.

2. Identifiescurrencytokensorsymbolsthatwereignoredinthefirstpass.

3. ForanyCJKcharacters,generatesabigramfortheCJKcharacterand,forKoreancharactersonly,preservestheoriginalword.

NotethatCJKandnon-CJKtextaretreatedasseparatetokens.PreservingtheoriginalKoreanwordincreasesthenumberoftokensinadocument.

FollowingisthedefinitionfromtheSolr managed-schema template.

©CopyrightPivotalSoftware,Inc,2013-2017 38 2.1.2

Page 39: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

<fieldTypeautoGeneratePhraseQueries="true"class="solr.TextField"name="text_intl"positionIncrementGap="100">

<analyzertype="index"><tokenizerclass="com.emc.solr.analysis.worldlexer.WorldLexerTokenizerFactory"/><filterclass="solr.CJKWidthFilterFactory"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="com.emc.solr.analysis.worldlexer.WorldLexerBigramFilterFactory"han="true"hiragana="true"katakana="true"hangul="true"/><filterclass="solr.StopFilterFactory"enablePositionIncrements="true"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="solr.PorterStemFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="com.emc.solr.analysis.worldlexer.WorldLexerTokenizerFactory"/><filterclass="solr.CJKWidthFilterFactory"/><filterclass="com.emc.solr.analysis.worldlexer.WorldLexerBigramFilterFactory"han="true"hiragana="true"katakana="true"hangul="true"/><filterclass="solr.StopFilterFactory"enablePositionIncrements="true"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="solr.PorterStemFilterFactory"/></analyzer></fieldType>

Followingaretheanalysisstepsfor text_intl .

1. Theanalyzerchainforindexingbeginswithatokenizercalled WorldLexerTokenizerFactory .Thistokenizerhandlesmostmodernlanguages.ItseparatesCJKcharactersfromotherlanguagetextandidentifiesanycurrencytokensorsymbols.

2. The solr.CJKWidthFilterFactory filternormalizestheCJKcharactersbasedoncharacterwidth.

3. The solr.LowerCaseFilterFactory filterchangesallletterstolowercase.

4. The WorldLexerBigramFilterFactory filtergeneratesabigramforanyCJKcharacters,leavesanynon-CJKcharactersintact,andpreservesoriginalKorean-languagewords.Setthe han , hiragana , katakana ,and hangul attributesto "true" togeneratebigramsforallsupportedCJKlanguages.

5. The solr.StopFilterFactory removescommonwords,suchas“a”,“an”,and“the”,whicharelistedinthe stopwords.txt configurationfile(seeToconfigureanindex).Iftherearenowordsinthe stopwords.txt file,nowordsareremoved.

6. The solr.KeywordMarkerFilterFactory markstheEnglishwordstoprotectfromstemming,usingthewordslistedinthe protwords.txtconfigurationfile(seeToconfigureanindex).If protwords.txt doesnotcontainalistofwords,allwordsinthedocumentarestemmed.

7. Thefinalfilteristhestemmer,inthiscase solr.PorterStemFilterFactory ,afaststemmerfortheEnglishlanguage.

Note:The text_intl analyzerchainforqueryingisthesameasthe text analyzerchainforindexing.

Ananalysischain, text ,isincludedinGPText’sSolr managed-schema andisbasedonSolr’sdefaultanalysischain.Becauseitstokenizersplitsonwhitespace, text cannotprocessCJKlanguages:whitespaceismeaninglessforCJKlanguages.Bestpracticeistousethe text_intl analyzer.

Forinformationaboutusingananalyzerchainotherthanthedefault,seeUsingthetext_smSocialMediaAnalyzer.

GPTextLanguageProcessing

Theroot-leveltokenizer, WorldLexerTokenizerFactory ,tokenizesinternationallanguages,includingCJKlanguages. WorldLexerTokenizerFactory tokenizeslanguagesbasedontheirUnicodepointsand,forLatin-basedlanguages,whitespace.

Note:UnicodeistheencodingforalltextintheGreenplumDatabase.

Thefollowingaresampleinputto,andoutputfrom,GPText.Eachlineintheoutputcorrespondstoaterm.

EnglishandCJKinput:

₩10대부분english자선단체는.

EnglishandCJKoutput:

₩10

©CopyrightPivotalSoftware,Inc,2013-2017 39 2.1.2

Page 40: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

대부분

대부

부분

english

자선

단체는

단체

체는

Bulgarianinput:

Cъставнаnарламента:вж.nротоколи

Bulgarianoutput:

cъстав

на

nарламента

вж

протоколиа

Danishinput:

Genoptagelseafsessionen

Danishoutput:

genoptagelse

af

sessionen

text_intlFilters

Thetext_intlanalyzerusesthefollowingfilters:

The CJKWidthFilterFactory normalizeswidthdifferencesinCJKcharacters.Thisfilternormalizesallcharacterwidthstofullwidth.

The WorldLexerBigramFilterFactory filterformsbigrams(pairs)ofCJKtermsthataregeneratedfrom WorldLexerTokenizerFactory .Thisfilterdoesnotmodifynon-CJKtext.WorldLexerBigramFilterFactory acceptsattributesthatguidethecreationofbigramsforCJKscripts.Forexample,iftheinputcontainsHANGULscriptbutthe hangul attributeissetto false, thisfilterwillnotcreatebigramsforthatscript.Toensurethat WorldLexerBigramFilterFactory createsbigramsasrequired,settheCJKattributes han , hiragana , katakana ,and hangul to true .

text_sm,theSocialMediaTextAnalyzerTheGPText text_sm textanalyzeranalyzestextfromsourcessuchassocialmediafeeds. text_sm consistsofatokenizerandtwofilters.Toconfigurethetext_sm textanalyzer,usethe gptext-config utilitytoeditthe managed-schema file.SeeTousethetext_smSocialMediaAnalyzerfordetails.

text_sm normalizesemoticons:itreplacesemoticonswithtextusingthe emoticons.txt configurationfile.Forexample,itreplacesahappyfaceemoticon,:-) ,withthetext“happy”.

ThefollowingisthedefinitionfromtheSolr managed-schema template.

©CopyrightPivotalSoftware,Inc,2013-2017 40 2.1.2

Page 41: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

<fieldTypeautoGeneratePhraseQueries="true"class="solr.TextField"name="text_sm"positionIncrementGap="100"termVectors="true"termPositions="true"termOffsets="true"><analyzertype="index"><tokenizerclass="com.emc.solr.analysis.text_sm.twitter.TwitterTokenizerFactory"delimiter="\t"emoticons="emoticons.txt"/><!--Caseinsensitivestopwordremoval.AddenablePositionIncrements=trueinboththeindexandqueryanalyzerstoleavea'gap'formoreaccuratephrasequeries.--><filterclass="solr.StopFilterFactory"enablePositionIncrements="true"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="com.emc.solr.analysis.text_sm.twitter.EmoticonsClassifierFilterFactory"delimiter="\t"emoticons="emoticons.txt"/><filterclass="com.emc.solr.analysis.text_sm.twitter.TwitterStemFilterFactory"/><analyzertype="query"><tokenizerclass="com.emc.solr.analysis.text_sm.twitter.TwitterTokenizerFactory"delimiter="\t"emoticons="emoticons.txt"/><filterclass="solr.StopFilterFactory"enablePositionIncrements="true"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="com.emc.solr.analysis.text_sm.twitter.EmoticonsClassifierFilterFactory"delimiter="\t"emoticons="emoticons.txt"/><filterclass="com.emc.solr.analysis.text_sm.twitter.TwitterStemFilterFactory"/></analyzer></fieldType>

TheTwitterTokenizer

TheTwittertokenizerextendstheEnglishlanguagetokenizer, solr.WhitespaceTokenizerFactory, torecognizethefollowingelementsasterms.

Emoticons

Hyperlinks

Hashtagkeywords(forexample,#keyword)

Userreferences(forexample,@username)

Numbers

Floatingpointnumbers

Numbersincludingcommas(forexample10,000)

timeexpressions(forexample,9:30)

Thetext_smfilters

com.emc.solr.analysis.socialmedia.twitter.EmoticonsClassifierFilterFactory classifiesemoticonsas happy , sad ,or wink .Itisbasedonthe emoticons.txt file(oneofthefilesyoucaneditwith gptext-config ),andisintendedforfutureuse,suchasinsentimentanalysis.

TheTwitterStemFilterFactory

com.emc.solr.analysis.socialmedia.twitter.TwitterStemFilterFactory extendsthe solr.PorterStemFilterFactory classtobypassstemmingofthesocialmediapatternsrecognizedbythe twitter.TwitterTokenizerFactory .

©CopyrightPivotalSoftware,Inc,2013-2017 41 2.1.2

Page 42: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Theemoticons.txtfile

Thisfilecontainslistsofemoticonsfor“happy,”“sad,”and“wink.”Theyareseparatedbyatabbydefault.Youcanchangetheseparationtoanycharacterorstringbychangingthevalueof delimiter inthesocialmediaanalysischain.Thefollowingisasamplelinefromthe text_sm analyzerchain:

<filterclass="com.emc.solr.analysis.text_sm.twitter.EmoticonsClassifierFilterFactory"delimiter="\t"emoticons="emoticons.txt"/>

Usingthetext_smSocialMediaAnalyzerTheSolr managed-schema filespecifiestheanalyzertousetoindexafield.Thedefaultanalyzeris text_intl .Tospecifythe text_sm socialmediaanalyzer,youusethe gptext-config utilitytomodifytheSolr managed-schema foryourindex.

Thestepsare:

1. Createanindexusing gptext.create_index() .Theindexcontainsalinesimilartothefollowingneartheendofthefile:

<fieldname="text_search_col"indexed="true"stored="false"type="text_intl"/>

The type fieldspecifiestheanalyzertouse. text_intl isthedefault.

2. Use gptext-config toeditthe managed-schema file:

gptext-config-fmanaged-schema-i<index_name>

3. Modifythelineasfollows:

<fieldname="text_search_col"indexed="true"stored="false"type="text_sm"/>

gptext-configfetchesthe managed-schema filefromtheconfigurationfilesdirectoryforyourindexandopensitinthe vi editor.Afteryoueditthefile,saveit,andquit vi ,gptext-configreturnsthefiletoitsoriginalconfigurationfilesdirectory.

UsingMultipleAnalyzerChainsIfyouwanttoindexafieldusingtwodifferentanalyzerchainssimultaneously,youcandothis:

Createanewemptyindex.Thenusethe gptext-config utilitytoaddanewfieldtotheindexthatisacopyofthefieldyouareinterestedin,butwithadifferentnameandanalyzerchain.

Letusassumethatyourindex,asinitiallycreated,includesafieldtoindexnamed mytext .Alsoassumethatthisfieldwillbeindexedusingthedefaultinternationalanalyzer( text_intl ).

Youwanttoaddanewfieldtotheindex’s managed-schema thatisacopyof mytext andthatwillbeindexedwithadifferentanalyzer(saythe text_smanalyzer).Todoso,followthesesteps:

1. Createanemptyindexwith gptext.create_index() .

2. Opentheindex’s managed-schema fileforeditingwith gptext-config .

3. Adda <field> inthe managed-schema foranewfieldthatwilluseadifferentanalyzerchain.Forexample:<fieldindexed="true"name="mytext2"stored="false"type="text_sm"/>

Bydefiningthetypeofthisnewfieldtobe text_sm ,itwillbeindexedusingthesocialmediaanalyzerratherthanthedefault text_intl .

4. Adda <copyField> in managed-schema tocopytheoriginalfieldtothenewfield.Forexample:<copyFielddest="mytext2"source="mytext"/>

5. Indexandcommitasyounormallywould.

Thedatabasecolumn mytext isnowintheindextwicewithtwodifferentanalyzerchains.Onecolumnis mytext ,whichusesthedefaultinternational

©CopyrightPivotalSoftware,Inc,2013-2017 42 2.1.2

Page 43: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

analyzerchain,andtheotheristhenewlycreated mytext2, whichusesthesocialmediaanalyzerchain.

UsingDifferentAnalyzerChainsforIndividualFieldsYoucanusedifferentanalyzersforindividualfieldsbyeditingthemanaged-schemaconfigurationfile.Forexample,ifonefieldcontainsEnglishtextandanothercontainsChineselanguagetext,youcanspecifydifferentanalyzersforthetwofields.

ExampleYouhaveatablenamed email_tbl withthefollowingdefinition:

createtableemail_tbl(idbigint,english_contenttext,chinese_contenttext,timestampdate,usernametext,ageint,...)#additionalcolumnsthatarenotindexed

Youwanttoindexthesixcolumnsshown— id , english_content , chinese_content , timestamp , username ,and age .

Forthecolumn english_content ,youwanttousetheEnglishlanguageanalyzercalled“text_en”forthetextsegmentation.

Forthecolumn chinese_content ,youwanttousetheinternationallanguageanalyzernamed“text_intl”.

Herearestepstoimplementthisexample:

1. CreatetheGPTextindexforthetable.

SELECT*FROMgptext.create_index('public','email_tbl','id','english_content');

2. Modifytheanalyzerforeachcolumnin managed-schema .

$gptext-config-idb.public.email_tbl-fmanaged-schema

3. Findtheelementforthe english_content field.

<fieldname="english_content"type="*"indexed="true"stored="true"/>

Changethe type attributeto text_en .

<fieldname="english_content"type="text_en"indexed="true"stored="true"/>

4. Findtheelementforthe chinese_content field.

<fieldname="chinese_content"type="*"indexed="true"stored="true"/>

Changethe type attributeto text_intl .

<fieldname="chinese_content"type="text_intl"indexed="true"stored="true"/>

5. Indexthetable.

SELECT*FROMgptext.index(TABLE(SELECTid,english_content,chinese_content,timestamp,username,ageFROMemail_tbl),'db.public.email_tbl');

6. Committheindex.

SELECT*FROMgptext.commit_index('db.public.email_tbl');

Thefieldtypes text_en and text_intl aredefinedin <fieldType> entriesinthemanaged-schemafileandthenreferencedinthe type attributeofthe

©CopyrightPivotalSoftware,Inc,2013-2017 43 2.1.2

Page 44: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

<field> element.

Youcandefineacustomfieldtypebyaddinga <fieldType> entrywithcustomanalyzersandthensettingthefield’s type attributetothenameofthecustomfieldtype.Forexample,thefollowing“text_customize”fieldtypeisacopyofthe“text_en”fieldtypeentrywiththesynonymfiltercommentedoutintheindexanalyzer.Thiscustomfieldtypewillapplythesynonymfiltertoqueries,butnottotheindex.

<fieldTypename="text_customize"class="solr.TextField"positionIncrementGap="100"><analyzertype="index"><tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><!--inthisexample,wewillonlyusesynonymsatquerytime<filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="false"/>--><filterclass="solr.LowerCaseFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"/><filterclass="solr.LowerCaseFilterFactory"/></analyzer></fieldType>

Afieldtypecanalsobecustomizedbyaddinganalyzersaschildelementsofthe <field> element:

<fieldname="english_content"type="text"indexed="true"stored="false"><analyzertype="index"><tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><!--inthisexample,wewillonlyusesynonymsatquerytime<filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="false"/>--><filterclass="solr.LowerCaseFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"/><filterclass="solr.LowerCaseFilterFactory"/></analyzer></field>

©CopyrightPivotalSoftware,Inc,2013-2017 44 2.1.2

Page 45: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

AdministeringGPTextGPTextadministrationincludessecurityconsiderations,monitoringSolrindexstatistics,andtroubleshooting.

ChangingGPTextServerConfigurationParametersConfigurationparametersusedwithGPTextarebuilt-intoGPTextwithdefaultvalues.YoucanchangethevaluesfortheseparametersbysettingthenewvaluesinaGreenplumDatabasesession.ThenewvaluesarestoredinZooKeeper.GPTextindexesusethevaluesofconfigurationparameterswhentheyarecreated.Changingconfigurationparametersaffectsnewindexes,butdoesnotaffectexistingindexes.

SeeGPTextConfigurationParametersforacompletelistofconfigurationparameters.

Aone-timeGreenplumDatabaseconfigurationchangeisneededforGreenplumDatabasetoallowsettinganddisplayingGPTextconfigurationvariables.Asthe gpadmin user,enterthefollowingcommandsinashell:

$gpconfig-ccustom_variable_classes-v'gptext'$gpstop-u

ThenconnecttoadatabasethatcontainstheGPTextschemaandexecutethe gptext.version() functiontoexposetheGPTextconfigurationvariables:

twitter=#select*fromgptext.version();

ChangethevaluesofGPTextconfigurationvariablesusingthe SET commandinasessionwithadatabasethatcontainstheGPTextschema.Thefollowingexamplesetsvaluesforthreeconfigurationparametersina psql session:

twitter=#setgptext.idx_buffer_size=10485760;SETtwitter=#setgptext.idx_delim='|';SETtwitter=#setgptext.extension_factor=5;SET

Youcanviewthecurrentvalueofaconfigurationparameterthatyouhavesetusingthe SHOW command:

twitter=#showgptext.idx_delim;gptext.idx_delim------------------|(1row)

SecurityandGPTextIndexesGPTextsecurityisbasedonGreenplumDatabasesecurity.YourprivilegestoexecuteGPTextfunctionsdependonyourprivilegesforthedatabasetablethatisthesourcefortheindex.Forexample,ifyouhaveSELECTprivilegesforatableintheGreenplumdatabase,thenyouhaveSELECTprivilegesforanindexgeneratedfromthattable.

ExecutingGPTextfunctionsrequiresoneofOWNER,SELECT,INSERT,UPDATE,orDELETEprivileges,dependingonthefunction.TheOWNERisthepersonwhocreatedthetableandhasallprivileges.SeetheGreenplumDatabaseAdministratorGuideforinformationaboutsettingprivileges.

CheckingZooKeeperStatusUsethezkManagerutilityfromthecommandlinetochecktheZooKeeperclusterstatus.IftheZookeeprclusterisboundtoGPText,youcanstartandstoptheclusterusingzkManager.

TochecktheZooKeeperclusterstatus,runthefollowingcommand:

zkManagerstate

©CopyrightPivotalSoftware,Inc,2013-2017 45 2.1.2

Page 46: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Theutilityliststhehosts,ports,latency,andfollower/leadermodeforeachZooKeeperinstance.Ifanodeisdown,it’smodeislistedasDown.

IftheZooKeeperclusterwasinstalledbytheGPTextinstaller,thezkManagerutilitycanbeusedtostartorstoptheZooKeepercluster.Tostartthecluster,runthefollowingcommand:

zkManagerstart

TostopZooKeeper,runthiscommand:

zkManagerstop

CheckingSolrCloudStatusYoucancheckthestatusoftheSolrCloudclusterandindexesbyrunningthe gptext-state utilityfromthecommandline.

TocheckthestateoftheGPTextnodesandeachindex,runthe gptext-state utilitywiththe -D ( --details )option:

gptext-state-D

ThiscommandreportsthestatusoftheGPTextnodesandstatusofeachGPTextindex.

Run gptext-statelist toviewjusttheindexes.

The gptext-statehealthcheck commandcheckstheGPTextconfigurationfiles,theindexstatus,requireddiskspace,userprivileges,andindexanddatabaseconsistency.Bydefault,therequireddiskspacecheckpassesifthereisatleast20%diskfree.Youcansetadifferentdiskfreethresholdusingthe--disk_free option.Forexample:

[gpadmin@gpdb-sandbox~]$gptext-statehealthcheck--disk_free=2520160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-ExecutehealthcheckonGPTextcluster!20160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextconfigfiles...20160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextindexstatus...20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforrequireddiskspace...20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforrequireduserprivileges...20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforindexesanddatabaseconsistency...20160629:15:45:27:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:27:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Done.

Seethe gptext-state utilityreferenceforadditionaloptions.

RecoveringGPTextNodesUsethe gptext-recover utilitytorecoverdownGPTextnodes,forexampleafterafailedGreenplumsegmenthostisrecovered.

Withnoarguments,the gptext-recover utilitydiscoversdownGPTextnodesandrestartsthem.

Withthe -f (or --force )option,ifaGPTextnodecannotberestartedandnoshardsaredown,thenodeisdeletedandcreatedagainonthesamehost.Missingreplicasareaddedandthefailednodeandfailedreplicasareremoved.

The -H ( --new_hosts )optionallowsrecreatingdownGPTextnodesonnewhoststhatreplacefailedhosts.ThedownGPTextnodesaredeletedandrecreatedonthenewhosts.Theargumenttothe -H optionisacomma-separatedlistofthenewhoststhataretoreplacethefailedhosts.Thenumberofnewhostsmustmatchthenumberoffailedhosts.Ifshardsaredown,itadvisesreindexing.Ifonlysomereplicasaredown,itrecreatesthereplicasonthenewhostsandupdates gptext.conf .

The -r optionrecoversreplicas,butdoesnotattempttorecoveranydownnodes.

Note:BeforerecoveringGPTextnodesonnewlyaddedhosts,ensurethatthefollowingGPTextprerequisiteshavebeeninstalledonthehost:

OracleJava1.6

©CopyrightPivotalSoftware,Inc,2013-2017 46 2.1.2

Page 47: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Python2.6

TheLinux lsof utility

ViewingSolrIndexStatisticsYoucanviewSolrindexstatisticsbyrunningthe gptext-state utilityfromthecommandline.

TolistallGPTextindexes,enterthefollowingcommandatthecommandline:

gptext-statelist

Acommandlinethatretrievesallstatisticsforanindex:

gptext-state--indexwikipedia.public.articles

Acommandlinethatretrievesthenumberofdocumentsinanindex:

gptext-state--indexwikipedia.public.articles--stats_columns=num_docs

Acommandlinethatretrieves num_docs andtheindex size :

gptext-state--indexwikipedia.public.articles--stats_columnsnum_docs,size

BackingUpandRestoringGPTextIndexesWiththe gptext-backup managementutility,youcanbackupaGPTextindextoashareddirectorysothat,ifneeded,youcanquicklyrecoverfromafailure.ThebackupcanberestoredtothesameGPTextsystemortoanothersystemwiththesamenumberofGreenplumsegments.

The gptext-backup managementutilitybacksupanindexanditsconfigurationfilestoasharedfilesystem,whichmustbeaccessibleandwritablebyeachhostintheGreenplumcluster.The --path command-lineoptionspecifiesthelocationofadirectoryonthemountedfilesystem.The --name optionprovidesanameforthebackup.

The gptext-backup utilityfirstchecksthat:

theGPTextclusterisup

thesharedfilesystemisvalid

thedirectoryspecifiedwiththe --name optiondoesnotalreadyexistatthelocationspecifiedbythe --path option

Theutilitycreatesthenewdirectoryandsavesonecopyofeachindexshardtothatdirectory,alongwiththeindex’sconfigurationfiles.

Torestoreanindex,usethe gptext-restore managementutility.TheGPTextsystemyourestoretomustbeonaGreenplumclusterwiththesamenumberofsegments.Thedatabase,schema,andbasetablefortheindexmustbepresent.

The --index optionspecifiesthenameoftheGPTextindexthatwillberestored.Iftheindexexists,youmustfirstdropitwiththe gptext.delete() user-definedfunction.

The --path optionspecifiesthelocationofthedirectorycontainingthebackupfiles—thedirectorythat gptext-backup createdonthesharedfilesystem.

Seegptext-backupforsyntaxanddetailsforrunning gptext-backup .Seegptext-restoreforsyntaxanddetailsforrunning gptext-backup .

ExpandingtheGPTextClusterThe gptext-expand managementutilityaddsGPTextnodestothecluster.Therearetwowaystoaddnodes:

AddGPTextnodestoexistinghostsinthecluster.ThisoptionincreasesthenumberofGPTextnodesoneachhost.

AddGPTextnodestonewhostsaddedbyusingtheGreenplum gpexpand managementutilitytoexpandtheGreenplumDatabasesystem.

©CopyrightPivotalSoftware,Inc,2013-2017 47 2.1.2

Page 48: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

AddingGPTextNodestoExistingSegmentHostsToaddnodestoexistingsegmenthosts,runthe gptext-expand utilitywithacommandlikethefollowing:

gptext-expand-e-p/data1/nodes,/data2/nodes

ThisexampleaddstwoGPTextnodestoeachhost.

The -e ( --existing )optionspecifiesthatnodesaretobeaddedtoexistinghosts.

The -p ( --expand_paths )optionprovidesalistofdirectorieswherethenewnodes’datadirectoriesaretobecreated.TheseshouldbethesamedirectoriesthatcontaintheGreenplumsegmentdatadirectoriesandexistingGPTextdatadirectories.Thenumberofdirectoriesinthelististhenumberofnewnodesthatareadded.

AdirectorycanberepeatedinthedirectorylistmultipletimestoincreasethenumberofnewGPTextnodestocreate.Forexample,ifthereiscurrentlyoneGPTextnodeperhostinthe /data1/nodes directory,youcouldaddthreenodeswithacommandlikethefollowing:

gptext-expand-e-p/data1/nodes,/data2/nodes,/data2/nodes

Thisaddsonenodetothe /data1/nodes directoryandtwonodestothe /data2/nodes directorysotherearetwoGPTextnodesineachdirectory.

AddingGPTextnodesaffectsnewindexes,butnotexistingindexes.Replicasfornewindexeswillbedistributedacrossallofthenodes,includingbotholdnodesandthenewlycreatednodes.Replicasforindexesthatexistedbeforerunning gptext-expand arenotautomaticallymoved.Rebalancingexistingreplicasrequiresreindexing.

AddingGPTextNodestoNewHostsCheckthatthefollowingGPTextprerequisitesareinstalledoneachnewhostaddedtotheGreenplumDatabasecluster:

OracleJava1.8

Python2.6orgreater

Linux lsof utility

NewhostsmustbereachablebyallhostsintheGPTextcluster,includingexistinghostsandthenewhostsyouareadding.

AfterexpandingtheGreenplumclusterwiththe gpexpand managementutility,call gptext-expand withthe -H ( --new_hosts )optionandalistofthenewhostsonwhichtoinstallGPText:

gptext-expand-Hnewhost1,newhost2

The gptext-expand utilityinstallsGPTextbinariesonthenewhostsandthencreatesnewGPTextnodesonthenewhosts.

ExpandingaGreenplumclusterincreasesthenumberofsegments,sothenumberofGPTextindexshardsforexistingindexesmustbeincreasedtoequalthenewnumberofsegments.Thisrequiresreindexingallexistingdocuments.Newlycreatedindexeswillautomaticallybedistributedamongthenewshards.

TroubleshootingGPTexterrorsareofthefollowingtypes:

Solrerrors

gptext errors

MostoftheSolrerrorsareself-explanatory.

gptext errorsarecausedbymisuseofafunctionorutility.Theyprovideamessagethattellsyouwhenyouhaveusedanincorrectfunctionorargument.

MonitoringLogs

©CopyrightPivotalSoftware,Inc,2013-2017 48 2.1.2

Page 49: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

YoucanexaminetheGreenplumDatabaseandSolrlogsformoreinformationiferrorsoccur.GreenplumDatabaselogsresidein:

segment-directory/pg-log

Solrlogsresidein:

<GPDBpath>/solr/logs

DeterminingSegmentStatuswithgptext-stateUsethe gptext-state utilitytodetermineifanyprimaryormirrorsegmentsaredown.See gptext-state intheGPTextManagementUtilitiesReference.

©CopyrightPivotalSoftware,Inc,2013-2017 49 2.1.2

Page 50: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

GPTextHighAvailabilityTheGPTexthighavailabilityfeatureensuresthatyoucancontinueworkingwithGPTextindexesaslongaseachshardintheindexhasatleastoneworkingreplica.

AGPTextindexhasoneshardforeachGreenplumsegment,sothereisaone-to-onecoorespondencebetweenGreenplumsegmentsandGPTextindexshards.TheshardmanagedbyaGreenplumsegmentisanindexofthedocumentsthataremanagedbythatsegment.

TheGPTexthighavailabilitymechanismistomaintainmultiplecopies,orreplicas,oftheshard.TheZooKeeperservicethatmanagesSolrCloudchoosesaGPTextinstance(SolrCloudnode)foreachreplicatoensureevendistributionandhighavailability.Foreachshard,onereplicaiselectedleaderandtheGreenplumsegmentassociatedwiththeshardoperatesonthisleaderreplica.TheGPTextinstancemanagingtheleadreplicamayormaynotbeonanotherGreenplumhost,soindexingandsearchingoperationsarepassedovertheGreenplumcluster’sinterconnectnetwork.SolrCloudreplicateschangesmadetotheleaderreplicatotheremainingreplicas.

ThefollowingfigureillustratestherelationshipsbetweenGreenplumsegmentsandGPTextindexshardsandreplicas.Theleaderreplica(green)

Thenumberofreplicastocreateforeachshard,thereplicationfactor,isaSolrCloudproperty.Bydefault,GPTextstartsSolrCloudwithareplicationfactorofthree.ThereplicationfactorforeachindividualindexisthevalueoftheSolrCloudreplicationfactorwhentheindexiscreated.Changingthereplicationfactordoesnotalterthereplicationfactorforexistingindexes.

GreenplumSegmentorHostFailureIfaGreenplumprimarysegmentfailsanditsmirrorisactivated,GPTextfunctionsandutilitiescontinuetoaccesstheleaderreplica.Nointerventionisneeded.

Ifahostintheclusterfails,bothGreenplumandGPTextareaffected.MirrorsfortheGreenplumprimarysegmentslocatedonthefailedhostareactivatedonotherhosts.SolrCloudelectsanewleaderreplicaforaffectedshards.BecauseGreenplumsegmentmirrorsandGPTextshardreplicasaredistributedthroughoutthecluster,asinglehostfailureshouldnotpreventtheclusterfromcontinuingtooperate.Theperformanceofdatabasequeriesandindexingoperationswillbeaffecteduntilthefailedhostisrecoveredandtheclusterisbroughtbackintobalance.

ZooKeeperClusterAvailabilitySolrCloudisdependentonaworking,availableZooKeepercluster.ForZooKeepertobeactive,amajorityoftheZooKeeperclusternodesmustbeupandabletocommunicatewitheachother.AZooKeeperclusterwiththreenodescancontinuetooperateifoneofthenodesfails,sincetwoisamajorityofthree.Totoleratetwofailednodes,theclustermusthaveatleastfivenodessothatthenumberofworkingnodesremainingafterthefailureareamajority.Totoleratennodefailures,then,aZooKeeperclustermusthave2*n*+1nodes.ThisiswhyZooKeeperclustersusuallyhaveanoddnumberofnodes.

Thebestpracticeforahigh-availabilityGPTextclusterisaZooKeeperclusterwithfiveorsevennodessothattheclustercantoleratetwoorthreefailednodes.

©CopyrightPivotalSoftware,Inc,2013-2017 50 2.1.2

Page 51: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

ManagingGPTextClusterHealthGPTextdocumentindexingandsearchingservicesremainavailableaslongaseachshardofanindexhasatleastoneworkingreplica.Toensureavailabilityintheeventofafailure,itisimportanttomonitorthestatusoftheclusterandensurethatalloftheindexshardreplicasarehealthy.YoucanmonitortheSolrCloudclusterandindexesusingtheSolrCloudDashboardorusingGPTextfunctionsandmanagementutilities.AccesstheSolrCloudDashboardwithawebbrowseronanyGPTextinstancewithaURLsuchas http://sdw3:18983/solr .(TheportnumbersforGPTextinstancesaresetwiththeGPTEXT_PORT_BASE parameterintheinstallationparametersfileatinstallationtime.)

RefertotheApacheSolrClouddocumentationforhelpusingtheSolrCloudDashboard.

MonitoringtheClusterwithGPTextTheGPText gptext-state managementutilityallowsyoutoquerythestateoftheGPTextclusterandindexes.Youcanalsouse gptext.index_status() toviewthestatusofallindexesoraspecifiedindex.

ToseetheGPTextclusterstaterunthe gptext-state command-lineutilitywiththe -d optiontospecifyadatabasethathastheGPTextschemainstalled.

gptext-state-dmydb

TheutilityreportsanyGPTextnodesthataredownandliststhestatusofeveryGPTextindex.Foreachindex,thedatabasename,indexname,andstatusarereported.Thestatuscolumncontains“Green”,“Yellow”,or“Red”:-Green–allreplicasforallshardsarehealthy-Yellow–allshardshaveatleastonehealthyreplicabutatleastonereplicaisdown-Red–noreplicasareavailableforatleastoneindexshard

ToseethedistributionofindexshardsandreplicasintheGPTextcluster,executethisSQLstatement.

SELECTindex_name,shard_name,replica_name,node_nameFROMgptext.index_summary()ORDERBYnode_name;

TolistallGPTextindexes,runthe gptext-statelist command.

gptext-statelist-dmydb

The gptext-statehealthcheck commandchecksthehealthofthecluster.The -f flagspecifiesthepercentageofavailablediskspacerequiredtoreportahealthycluster.Thedefaultis10.

gptext-statehealthcheck-f20-dmydb

See gptext-state intheManagementUtilitiesreferenceforhelpwithadditional gptext-state options.

Thegptext.index_status()user-definedfunctionreportsthestatusofallGPTextindexesoraspecifiedindex.

SELECT*FROMgptext.index_status();

Specifyanindexnametoreportonlythestatusofthatindex.

SELECT*FROMgptext.index_status('mydb.public.messages');

AddingandDroppingReplicasThe gptext-replica utilityaddsordropsareplicaofasingleindexshard.Usethe gptext.add_replica() and gptext.delete_replica() user-definedfunctionstoperformthesametasksfromwithinthedatabase.

Ifareplicaofashardfails,use gptext-replica toaddanewreplicaandthendropthefailedreplicatobringtheindexbackto“Green”status.

gptext-replicaadd-imydb.public.messages-sshard3

Hereistheequivalent,usingthe gptext.add_replica() function:

©CopyrightPivotalSoftware,Inc,2013-2017 51 2.1.2

Page 52: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

SELECT*FROMgptext.add_replica('mydb.public.messages',shard3);

ZooKeeperdetermineswherethereplicawillbelocated,butyoucanalsospecifythenodewherethereplicaiscreated:

gptext-replicaadd-imydb.public.messages-sshard3-nsdw3

Inthe gptext.add_replica() function,addthenodenameasathirdargument.

Todropareplica,call gptext.delete_replica() withthenameoftheindex,thenameoftheshard,andthenameofthereplica.Youcanfindthenameofthereplicabycalling gptext.index_status(index_name) .Thenameisintheformat core_noden .Anoptional -o flagspecifiesthatthereplicaistobedeletedonlyifitisdown.

gptext-replicadrop-imydb.public.messages-sshard3-rcore_node4-o

Hereistheequivalentoftheabovecommandusingthe gptext.delete_replica() user-definedfunction.

SELECT*FROMgptext.delete_replica('mydb.public.messages','shard3','cord_node4',true);

©CopyrightPivotalSoftware,Inc,2013-2017 52 2.1.2

Page 53: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

GPTextBestPracticesEachGPText/ApacheSolrnodeisaJavaVirtualMachine(JVM)processandisallocatedmemoryatstartup.ThemaximumamountofmemorytheJVMwilluseissetwiththe -Xmx parameterontheJavacommandline.Performanceproblemsandoutofmemoryfailurescanoccurwhenthenodeshaveinsufficientmemory.

OtherperformanceproblemscanresultfromresourcecontentionbetweentheGreenplumDatabase,Solr,andZooKeeperclusters.

ThistopicdiscussesGPTextusecasesthatstressSolrJVMmemoryindifferentwaysandthebestpracticesforpreventingoralleviatingperformanceproblemsfrominsufficientJVMmemoryandothercauses.

IndexingLargeNumbersofDocumentsIndexingdocumentsconsumesdatainSolrJVMmemory.Whentheindexiscommitted,partsofthememoryarereleased,butsomedataremainsinmemorytosupportfastsearch.Bydefault,Solrperformsanautomaticsoftcommitwhen1,000,000documentsareindexedor20minutes(1,200,000milliseconds)havepassed.Asoftcommitpushesdocumentsfrommemorytotheindex,freeingJVMmemory.Asoftcommitalsomakesthedocumentsvisibleinsearches.Asoftcommitdoesnot,however,maketheindexupdatesdurable;itisstillnecessarytocommittheindexwiththe gptext.commit()

user-definedfunction.

Youcanconfigureanindextoperformamorefrequentautomaticsoftcommitbyeditingthe solrconfig.xml filefortheindex:

$gptext-config-fsolrconfig.xml-i<db>.<schema>.<index-name>

The <autoSoftCommit> elementisachildofthe <updateHandler> element.Editthe <maxDocs> and <maxTime> valuestoreducethetimebetweenautomaticcommits.Forexample,thefollowingsettingsperformanautocommitevery100,000documentsor10minutes.

<autoSoftCommit><maxDocs>100000</maxDocs><maxTime>600000</maxTime></autoSoftCommit>

IndexingVeryLargeDocumentsIndexingverylargedocumentscanusealargeamountofJVMmemory.Tomanagethis,youcansetthe gptext.idx_buffer_size configurationparametertoreducethesizeoftheindexingbuffer.

SeeChangingGPTextServerConfigurationParametersforinstructionstochangeconfigurationparametervalues.

ConfigureMaximumJVMHeapSizeEachSolrcorefileconsumesJVMheapmemory.AddingmoreindexesincreasesJVMswappingandgarbagecollectionfrequencysothatittakeslongertocreateindexesandtoloadthecorefileswhenGPTextisstarted.IfyoucontinuetocreateindexeswithoutincreasingtheJVMheap,anoutofmemoryerrorwilleventuallyoccur.

MonitorperformanceatstartupandduringindexcreationandincreasetheJVMsizewhenyoubegintoseedegradedperformance.Youcanalsousetoolssuchasjconsole,includedwiththeJavaDeveloperKit,tomonitorJavaheapusage.Ifgarbagecollectionsareoccurringtoofrequentlyandfreeingtoolittlememory,JVMheapshouldbeincreased.

Usethe -Xmx JVMcommandlineoptiontoincreasetheJVMheapsize.Forexample,this gptext-config commandsetstheJVMmaximumheapto4GB:

$gptext-config-o"-Xmx=4096M"

ManageIndexingandSearchLoadsWithhighindexingorsearchload,JVMgarbagecollectionpausescancausetheSolroverseerqueuetobackup.ForaheavilyloadedGPTextsystem,youcanpreventsomeperformanceproblemsbyschedulingdocumentindexingfortimeswhensearchactivityislow.

©CopyrightPivotalSoftware,Inc,2013-2017 53 2.1.2

Page 54: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

TermsQueriesandOutofMemoryErrorsThe gptext.terms() functionretrievestermsvectorsfromdocumentsthatmatchaquery.Anoutofmemoryerrormayoccurifthedocumentsarelarge,orifthequerymatchesalargenumberofdocumentsoneachnode.Otherfactorscancontributetooutofmemoryerrorswhenrunninga gptext.terms() query,includingthemaximummemoryavailabletotheSolrnodes(-Xmxvaluein JAVA_OPTS )andconcurrentqueries.

Ifyouexperienceoutofmemoryerrorswith gptext.terms() youcansetalowervalueforthe term_batch_size GPTextconfigurationvariable.Thedefaultvalueis1000.Forexample,youcouldtryrunningthefailingquerywith term_batch_size setto500.Loweringthevaluemaypreventoutofmemoryerrors,butperformanceoftermsqueriescanbeaffected.

SeeGPTextConfigurationParametersforhelpsettingGPTextconfigurationparameters.

ConfigureFileSystemCachingforZooKeeperGoodSolrperformanceisdependentonfastresponseforZooKeeperrequests.ZooKeeperperformsbestwhenitsdatabaseiscachedsoitdoesnothavetogotodiskforlookups.IfyoufindthatZooKeeperJVMshavefrequentdiskaccesses,lookforwaystoimprovefilecachingormoveZooKeeperdiskstofasterstorage.

TheZooKeeper zkClientTimeout parameteristhetimeaclientisallowedtonottalktoZooKeeperbeforehavingitssessionexpired.

©CopyrightPivotalSoftware,Inc,2013-2017 54 2.1.2

Page 55: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Glossary

Aanalyzer

DefinesthesetoftermsforaSolrindexfield.Ananalyzerconsistsofatokenizerandasetofoptionalfilterstobeappliedtotheinputtext.Forexample,ananalyzercanconsistofaWhiteSpaceTokenizerFactoryfollowedbyaLowerCaseFilterFactoryasafilter.Seealso:taxonomy.

Bbigram

Asequenceoftwoadjacentelementsinatokenstring.Asequenceofthreeconsecutivetokensisatrigramandasequenceofnconsecutivetokensisann-gram.

binaryclassificationTheprocessofsortingdataintooneoftwocategories,forexample,classifyingagiventextaccordingtowhetherthetextisassociatedwithapositiveornegativesentiment.

Classificationproblemswithmorethantwoclassesintowhichthedataistobecategorizedarecalledmulticlassclassificationproblems.

Ccentroid

Inclusteringproblems,acentroidrepresentstheapproximatecenterofacluster.

Acentroiddoesnothavetomapdirectlytoadatapointinthecluster.Forexample,ink-meansclusteringthecoordinatesofacentroidarethemeanofthecoordinatesofdatapoints(documents)pertainingtothatclusterandareconstantlyupdatedasnewdatapointassignmentsaremade.

clusterAsetofidenticaldatapoints.Forexample,ifsomedocumentsaretobegroupedintothreeclusters,theresultofamachinelearningalgorithmisthreeclusters:allthedocumentswithinaparticularclusteraresimilartoeachother,butdifferentfromthedocumentsinotherclusters.Theresultsandthequalityoftheclustersdependonvariousfactors,includingthealgorithmused,theparametersthatwereconfigured,andsetoffeaturesused.

collectionAcomplete,logicalindexinaGPTextorSolrCloudsystem.Alogicalindexiscomposedofmultipleshards.AGPTextcollectionhasoneshardperGreenplumsegment.

configsetTheSolrconfigurationfilesthatdefinethestructureandconfigurationofaGPTextindex,suchas managed-schema and solrconfig.xml .ThesefilesarestoredinZooKeeper.

coreAreplicaofashard,implementedinGPTextandSolrCloudasaLuceneindex.AlsocalledSolrCore.

corpusAcollectionofdocuments.Plural:corpora.

Ddictionary

Alistofuniquewordsortermsfromthedocumentsthatcomprisethevocabularyofthedocumentcollection.

dimensionAgeneralizedtermforafeatureofdata,suchaswordcountsinadocument.Dimensionsaretypicallylargeinnumber.

Ann-dimensionalvectorisavectoraccordingtowhichadocumentcanbeexpressedinann-dimensionalfeaturespace.Forexample,ifyour

©CopyrightPivotalSoftware,Inc,2013-2017 55 2.1.2

Page 56: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

dictionarycontainsnuniqueterms,adocumentcouldbeexpressedinann–dimensionalvectorwhereeachpositioncontainsthecountwithwhichaparticulartermfromthedictionaryappearsinthatdocument.Afeaturespacecouldbetheentiredictionaryorcouldbeanotherdictionary(orsetoffeatures)extractedbyusingfeatureselection.

dimensionalityreductionTheprocessofreducingthedimensions(orfeatures)accordingtowhichthedata(ordocuments)isexpressedinafeaturespace.Forexample,selectingterms(orfeatures)fromthedictionarythatappearinmorethankdocumentsintheentirecorpusgivesonesetofreduceddimensions.

Lleadreplica

Areplicathathasbeenelectedleaderforashard.Whendocumentsareindexed,SolrCloudsendsthemtotheleadreplicafortheshardandtheleaddistributesthemtoalloftheshard’sremainingreplicas.Alsocalledleader.

Mmachinelearning

Abranchofartificialintelligencethatfocusesontheconstructionandstudyofsystemsthatcanlearnfromdata.

Nnaturallanguageprocessing

Afieldofstudythatcombinescomputerscience,artificialintelligence,andliguisticstostudyinteractionsbetweencomputersandhumanlanguages.

Oontology

Theformalrepresentationofknowledgeasasetofconcepts(ideas,entities,events)andtheirpropertiesandrelationsaccordingtoasystemofcategorieswithinadomain.Ontologiesprovidethestructuralframeworksfororganizinginformationforfieldssuchasartificialintelligence.Ontologyisnotasynonymfortaxonomy.

Pproximity,termproximity

Asearchthatlooksfordocumentsinwhichtwoormoreseparatelymatchingtermoccurencesarewithinaspecifieddistance(anumberofintermediatewordsorcharacters).

Qqueryparser

Acomponentthatparsestheinputqueriesprovidedforsearch.

Rreplica

Asinglecopyofashard,managedasaSolrcore.Onereplicaforeachshardiselectedastheleadreplica,orleader.Allupdatesandsearchesaredirectedtotheleadreplica;changesarereplicatedfromtheleadtotheremainingreplicas.

©CopyrightPivotalSoftware,Inc,2013-2017 56 2.1.2

Page 57: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Ssentimentanalysis

Classifiesopinionsexpressedintextdocumentsintocategoriessuchas“positive”and“negative”.

shardAlogicalpieceorsliceofacollection.InGPText,thereisoneshardforeachGreenplumDatabasesegment.Ashardismadeupofoneormorereplicas.Onereplicaiselectedtheleadshard,orleader,andupdatestotheleaderarereplicatedtotheotherreplicas.

silhouettecoefficient(SC)Aquantitivemeasureofdataclusteringperformance.SCmeasureshowtightlygroupedallthedataintheclusterare.Itsvaluesrangebetween–1and1.Valuesnear1indicatethatclusteringwasgood,andvaluesnear-1indicatethatclusteringwasnotgoodandthedatapointmusthavebeenassignedtoanothercluster.Valuesnear0indicatethattheclusterassignmentwasambiguous,andthedatapointissomewhereontheboundaryofthecluster.

sparsevectorAvectorwhoseelementsaremostlyzerosorareunpopulated.SeetheGettingStartedwithGPTextGuideformoreinformation.

stemThepartofawordthatiscommontoallitsinflectedvariants(howyoumodifyawordtoexpressitsdifferentgrammaticalcategories,forexample,byconjugatingaverb).Forexample,receives,receiving,andreceivedallderivefromthestem“receiv”.

stemmingTheprocessforreducinganinflectedorderivedwordtoitsstem,base,orrootform.Thestemisnotnecessarilythesameastherootform.Forexample,receives,receiving,andreceivedallderivefromthestem“receiv”;therootformisreceive.

supportvectormachine(SVM)Asupervisedlearningmodelthatclassifiesdatabyanalyzingthedata,recognizingpatternsinthedata,andplacingthedatainspecificclasses.Applicationsincludesentimentanalysis,separatingspamemailfromlegitimateemail,and,iftheSortingHatwereanSVM,determiningtheHousetowhichnewHogwartsstudentsareassigned.

Ttaxonomy

Ahierarchicalsystemofclassification;amethodfordividingterms,concepts,orotherentitiesintoorderedgroupsorcategories.Taxonomiesdifferfromontologiesinthattheyaregenerallyfocused,simpletreerelationships,andontologieshavewider,broaderscopes.

termAdistinctwordorvaluewithinafield.

TF-IDFscoreTermfrequency-inversedocumentfrequency.Anumericstatisticthatreflectshowimportantawordistoasetofdocuments.Thetf-idfscoreincreasesproportionatetothenumberoftimesawordappearsinadocument.

TF-IDFvectorAvectorcontainingtf-idfscores.

tokenUnitsintowhichaninputstringisbroken.Forexample,atokencanbeindividualtermsinabigram(multipleterms)ortrigramthatappearintheinputtext.

tokenizerBreaksastreamoftextintotokensbasedondelimiters,theseparatorsthatspecifythecharacterstoconsiderasthetokenboundaries,orsomeregularexpressions.Forexample,adelimitercouldbeawhitespace.

Tokenizersarenotawareoffieldsinadocument.

tokenfilterTakesastreamoftokensproducedbyatokenizer,examineseachtoken,andeitherpassesthetokenalongordiscardsit.Forexample,atokenfiltermayremovewhitespace,unnecessarywordssuchas“a”,“an”,or“the”,orremovedotsfromacronyms.Tokenfiltersproduceanotherstreamoftokensthatcanbeinputtoothertokenfilters.

U

©CopyrightPivotalSoftware,Inc,2013-2017 57 2.1.2

Page 58: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

UniversalQueryParserTheGPTextUniversalQueryParserparsesqueriescontainingexpressionssupportedbyanysupportedqueryparser.

ZZooKeeper

ApacheZooKeeperisaserverproductthatprovidesclustermanagementservicessuchascentralizedconfigurationmanagement,loadbalancing,andfailover.ItisaSolrCloudrequirement.ZooKeeperhandlesleaderelectionsforSolrCloudreplicas.Forhighavailability,ZooKeeperisdeployedasaclusterof3,5,or7nodes.TheZooKeeperclustercanbedeployedwithGPTextonthesameclusterhostsoranexistingZooKeeperclustercanbeused.

©CopyrightPivotalSoftware,Inc,2013-2017 58 2.1.2

Page 59: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

GPTextReferencesThisguideprovidesreferenceinformationaboutPivotalGPTextfunctionsandcommand-lineutilities.

GPTextFunctions

ManagementUtilities

GPTextandSolrDataTypeMappings

GPTextConfigurationParameters

©CopyrightPivotalSoftware,Inc,2013-2017 59 2.1.2

Page 60: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

GPTextFunctionReferenceThefollowingfunctionsareavailableinPivotalGPText.

Indexinggptext.create_index()–createsanemptyindex.

gptext.index()–populatesanindex.

gptext.commit_index()–finalizesindexoperations.

gptext.enable_terms()–enablestermvectorsandpositionstoallowextractingtermsandtheirpositionsfrom text fields.

ModifyingorDeletinganIndexgptext.add_field()–addsafieldtoanindex.

gptext.delete()–deletesdocumentsmatchingasearchquery.

gptext.drop_field()–deletesafieldfromanindex.

gptext.drop_index()–deletesanindex.

Searchgptext.search()–searchesanindex.

gptext.search_count()–returnsnumberofdocumentsthatmatchsearch.

gptext.gptext_retrieve_field–extractsasinglefieldfromthe rf searchresultcolumnastext.

gptext.gptext_retrieve_field_int–extractsasinglefieldfromthe rf searchresultcolumnandconvertstoaninteger.

gptext.gptext_retrieve_field_float–extractsasinglefieldfromthe rf searchresultcolumnandconvertstoafloat.

gptext.highlight()–returnssearchresultwithsearchtermhighlighted.

FacetedSearchgptext.faceted_field_search()–search,facetedbyfields.

gptext.faceted_query_search()–search,facetedbyqueries.

gptext.faceted_range_search()–search,facetedbydefinedranges.

WorkingWithTermsgptext.enable_terms()–enablestermvectorsandpositions.

gptext.terms()–getsthetermvectorsfortheindexeddocumentsinaSolrindexforthespecifiedfield.

ConfigurationandMonitoringgptext.cluster_status()-showsstatusofindexesmanagedbytheGPTextcluster.

©CopyrightPivotalSoftware,Inc,2013-2017 60 2.1.2

Page 61: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

gptext.index_status()–showsstatusofreplicasforanindexorforallindexes.

gptext.partition_status()-listspartitionedindexesandchildpartitions.

gptext.reload_index()–reloadsSolrconfigurationfiles.

gptext.version()–returnsversionofGPTextinstallation.

HighAvailabilitygptext.add_replica()–Addsareplicaofanindexshard.

gptext.delete_replica()–Deletesareplicaofanindexshard.

GeneralPurposeFunctionsgptext.count_t()–countsnumberofrowsinatable.

PrivilegesYourprivilegestoexecutetheGPTextfunctionsdependonyourGreenplumDatabaseprivilegesforthetablefromwhichtheindexisgenerated.Forexample,ifyouhaveSELECTprivilegesforatableintheGreenplumdatabase,youhaveSELECTprivilegesforanindexgeneratedfromthattable.

ExecutingindexfunctionsrequiresoneofOWNER,SELECT,INSERT,UPDATE,orDELETEprivileges,dependingonthefunction.TheOWNERisthepersonwhocreatedthetableandhasallprivileges.SeetheSecuritysectionoftheGPTextUser’sGuideforinformationaboutsettingprivileges.

ThePrivilegesrequiredsectionforeachoftheGPTextfunctionsspecifiestheprivilegesrequiredtoexecutethatfunction.

UsageThe gptext functionsinthissectionmustbeexecutedasSQLqueriesintheform:

SELECT*FROMgptext.function();

YoumustrunthesequeriesfromtheGreenplumDatabasemaster.

Theexamplesinthissectionuseoneofthefollowing:

AGreenplumdatabasenamed wikipedia setupasfollows:

A public schema.Atablenamed articles .The articles tablehasthefollowingcolumns:id , date_time , title , content , refs .Thedefaultsearchcolumnis content .Thenameoftheindexonthe articles tableis wikipedia.public.articles .

IndexingIndexingfunctionscreate,setup,populate,andfinalize(commit)Solrindexes.

gptext.create_index()CreatesanemptySolrindex.

©CopyrightPivotalSoftware,Inc,2013-2017 61 2.1.2

Page 62: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Syntax

gptext.create_index(schema_name,table_name,id_col_name,def_search_col_name[,if_check_id_uniqueness])

or

gptext.create_index(schema_name,table_name,p_columns,p_types,id_col_name,def_search_col_name[,if_check_id_uniqueness])

Parameters

schema_name

ThenameoftheschemaintheGreenplumdatabase.table_name

ThenameofthetableintheGreenplumdatabase.Ifthetableispartitionedthismustbethenameoftheroottable.p_columns

Atextarraycontainingthenamesofthetablecolumnstoindex.If p_columns and p_types areomitted,alltablecolumnsareindexed.

Thecolumnsmustbevalidcolumnsinthetable.Thecolumnsidentifiedbythe id_col_name and def_search_col_name mustbeincludedinthearray.

Ifthe p_columns parameterissupplied,the p_types parametermustalsobesupplied.Thesizesofthe p_columns and p_types arraysmustbethesame.

p_types

AtextarraycontainingtheSolrdatatypesofthecolumnsinthe p_columns array.

Texttypescanbemappedtothenameofananalyzerchain,forexample text_intl , text_sm ,oranytypedefinedinthe managed_schema .SeeMapGreenplumDatabaseDataTypestoSolrDataTypesforequivalentSolrdatatypesforotherGreenplumtypes.

The p_types parametermustbesuppliedifthe p_columns parameterissupplied.

id_col_name

Thenameofacolumnin table_name thatisuniqueforeachrow.Thecolumnmustbeoftype int4 , int8 , varchar , text ,or uuid .def_search_col_name

Thenameofthedefaultcolumntosearchin table_name ,ifnoothercolumnisnamedinaquery.if_check_id_uniqueness

Optional.ABooleanvalue.Thedefaultistrue.Settofalsetoindexatablewithanon-uniqueIDfield.

Returntype

boolean

Privilegesrequired

OnlytheOWNERcanexecutethisfunction.

Remarks

AGPTextindexisaSolrcollection.

Thecontentsofthe id_col_name columnshould,inmostcases,beauniqueIDforeachrow.Itmustbeoftype int4 , int8 , varchar ,or text .

Ifthe if_check_id_uniqueness argumentistrue,thedefault,adocumentwithanIDmatchinganexistingIDcannotbeaddedtotheindex.

Ifthe if_check_id_uniquess argumentisfalse,documentswithduplicateIDsareallowedtobeaddedtotheindex.ThecontentofotherfieldsmayormaynotbethesameasexistingdocumentswiththesameID.WhenaqueryreturnsmultipledocumentswiththesameID,itistheuser’sresponsibilitytoanticipateandhandlethemultipledocuments.Forexample,atablecouldhavea revision columnthatisincrementedwhenanewversionofadocumentisaddedtotheindex,allowingqueriesthatomitallbutthemostrecentversionfromsearchresults.

©CopyrightPivotalSoftware,Inc,2013-2017 62 2.1.2

Page 63: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Thenameoftheindexcreatedhastheformat:

<database_name>.<schema_name>.<table_name>

Whenthetableispartitioned,theGPTextindexcreatedforthetablewillcontainrecordsforallpartitions.Ifyouspecifythenameofasubpartitiontableinthisfunctionanerrorisreturned.Theindexrecordsfordocumentsaddedtotheindexhavea __partition fieldcontainingthenameofthechildpartitiontable.SeeSearchingPartitionedTablesforsyntaxtosearchbypartitions.

Populatethenewindexwithgptext.index().

Thenumberofreplicasforeachshardisdeterminedwhentheindexiscreated.Itisthevalueofthe gptext.replication_factor serverconfigurationparameter,2bydefault.

Ifthe gptext.failover_factor serverconfigurationparameterisset, gptext.create_index() failsiftheratioofthenumberofGPTextnodesthatareuptothetotalnumberofGPTextnodesislessthanthe gptext.failover_factor value(from0.0to1.0).IndexshardscanonlybecreatedonactiveGPTextnodes,sothegptext.failover_factor parameterpreventsoverloadingtheactiveGPTextnodeswhentoomanynodesaredown.

Toindexapartitionedtable,specifythenameoftheroottable.The gptext.index() functionreturnsanerrorifyouspecifythenameofachildpartitiontable.

ExampleCreateanindex, wikipedia.public.articles ,with content asthedefaultsearchfield. sql=>SELECT*FROMgptext.create_index('public','articles','id','content');

Createanindex, wikipedia.public.articles ,with content asthedefaultsearchfield.Indexthe id , content ,and title fields. sql=>SELECT*FROMgptext.create_index('public','articles','{"id","content","title"}','{"long","text","text"}','id','content')

gptext.enable_terms()Enablestermvectorsandpositionstoallowextractingtermsandtheirpositionsfromfieldsofdatatype text .

Syntax

gptext.enable_terms(index_name,field_name)

Parameters

index_name

Thenameoftheindexforwhichyouwanttoenableterms.field_name

Thenameofthefieldforwhichyouwanttoenableterms.

Returntype

boolean

Privilegesrequired

OnlytheOWNERcanexecutethisfunction.

Remarks

Solrcanmarktermsandtheirpositionsindocumentswhenindexing.Thiscapabilityisdisabledbydefault.Use gptext.enable_terms() toenablethecapability.

©CopyrightPivotalSoftware,Inc,2013-2017 63 2.1.2

Page 64: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Call gptext.enable_terms() foreachfieldwhereyouwanttoenableterms.

Aftercallingthisfunction,youmustindexorre-indexwithgptext.index().

Example

=#SELECT*FROMgptext.enable_terms('wikipedia.public.articles','content');WARNING:Enabletermsforfield:contentofindex:wikipedia.public.articlessuccessfully.Reindexdataneeded.enable_terms--------------t(1row)

SELECT*FROMgptext.index(TABLE(SELECT*FROMarticles),'wikipedia.public.articles');SELECT*FROMgptext.commit_index('wikipedia.public.articles');

gptext.index()Populatesanindexbyindexingdatainatable.

Syntax

gptext.index(TABLE(SELECT*FROMtable_name),index_name)

Parameters

TABLE(SELECT * FROM table_name)

Thetabletobeindexed,withdatatype anytable .index_name

Nameoftheindexthatwascreatedwith gptext.create_index() andistobepopulated.

Returntype

SETOFdbidINT,num_docsBIGINT

where dbid isthe dbid ofthesegmentthatthedocumentsweresentto,and num_docs isthenumberofdocumentsthatwereindexed.

Privilegesrequired

YoumusthavetheINSERTorUPDATEprivilegetoexecutethisfunction.

Remarks

index_name musthavebeencreatedwith gptext.create_index() .

Theargumentstothe gptext.index() functionmustbeexpressions.Forexample,TABLE(SELECT*FROMarticles)createsa“table-valuedexpression”fromthearticlestable,usingthetablefunction TABLE .

Youcanselectivelyindex/updatebychangingtheinnerselectlistinthequery.

Aftersuccessfullyindexing,youmustcommittheindexwith gptext.commit_index(index_name) .

Theoutputincludesatwo-columntablewith dbid (theGreenplumsegmentID)and num_docs (thenumberofdocumentsaddedtotheindexforthatsegment)asthecolumns.

Note:

©CopyrightPivotalSoftware,Inc,2013-2017 64 2.1.2

Page 65: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Becarefulaboutdistributionpolicies:

Thefirstparameterof gptext.index() is TABLE(SELECT*FROMmessages)

.Thequeryinthisparametershouldhavethesamedistributionpolicyasthe

tableyouareindexing.Therearetwocaseswherethequerywillnothavethesamedistributionpolicy:

1. Yourqueryisajoinoftwotables

2. Youareindexinganintermediatetable(stagingtable)thatisdistributeddifferentlythanthefinaltable.

Whenthedistributionpoliciesdiffer,youmustspecify "SCATTERBY"

forthequerylikethis:

TABLE(SELECT*FROMmessagesSCATTERBYdistrib_id) ,

where distrib_id isthedistributionidusedwhenyoucreatedyourprimary/finaltable.

Example

=#SELECT*FROMgptext.index(TABLE(select*FROMarticles),'wikipedia.public.articles');dbid|num_docs------+----------3|62|5(2rows)

gptext.commit_index()Finishesanindexoperation.Theresultsofanindexingoperationarenotavailableuntilthisfunctioniscalledfortheindex.

Syntax

gptext.commit_index(index_name)

Parameters

index_name

Thenameoftheindextocommit.Ifthetableispartitionedthismustbethenameoftheroottable.

Returntype

boolean

Privilegesrequired

YoumusthavetheINSERT,UPDATE,orDELETEprivilegetoexecutethisfunction.

Remarks

Mustbecalledaftergptext.index()andgptext.delete().

Example

©CopyrightPivotalSoftware,Inc,2013-2017 65 2.1.2

Page 66: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

=#SELECT*FROMgptext.commit_index('wikipedia.public.articles');commit_index--------------t(1row)

ModifyingorDeletinganIndexYoucanchangeanindexbyaddingordroppingfields,revertinganindextoitspreviousstate,ordeletingtheindex.

gptext.add_field()Addsafieldtoyourschemaifthefieldwasaddedtothedatabaseaftertheindexwascreated.

Syntax

gptext.add_field(index_name,field_name[,is_default_search_col[,if_enable_terms]])

Parameters

index_name

Thenameoftheindextowhichyouwanttoaddthefield.Ifthetableispartitionedthismustbethenameoftheroottable.field_name

Thenameofthefieldtobeindexed.is_default_search_col

Optional.Booleanvalue.Isthistobecomethedefaultsearchcolumn(field)?if_enable_terms

Optional.Booleanvalue.EnabletermssupportonthisfieldwhenaddedtotheGPTextindex.

Returntype

SETOFboolean

Privilegesrequired

OnlytheOWNERcanexecutethisfunction.

Remarks

Callthisfunctionforeachfieldyouadd.

Beforeandafteryouaddoneormorefields,reloadtheSolrconfigurationfileswithgptext.reload_index().Theinitial reload_index() callisrequiredbecauseofSolr4.0behaviorandmaynotberequiredinsubsequentversions.

Afteryouaddoneormorefields,youmustalsocreateandpopulateanewindexwithgptext.create_index() and gptext.index(),thencommitwithindexwithgptext.commit_index().

Example

Addsthefield external_links totheindex,thenrecreates,repopulates,andcommitstheindex.

©CopyrightPivotalSoftware,Inc,2013-2017 66 2.1.2

Page 67: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

=#SELECT*FROMgptext.reload_index('wikipedia.public.articles');=#SELECT*FROMgptext.add_field('wikipedia.public.articles','external_links',false,false);INFO:Addfield:external_linksforindex:wikipedia.public.articlesadd_field-----------t(1row)

=#SELECT*FROMgptext.reload_index('wikipedia.public.articles');reload_index--------------t(1row)

SELECT*FROMgptext.commit_index('wikipedia.public.articles');

gptext.delete()Deletesalldocumentsthatmatchthesearchquery.

Syntax

gptext.delete(index_name,query)

Parameters

index_name

Thenameoftheindex.query

Documentsmatchingthisquerywillbedeleted.Todeletealldocumentsusethequery '*' or '*:*' .

Returntype

boolean

Privilegesrequired

YoumusthavetheDELETEprivilegetoexecutethisfunction.

Remarks

Afterasuccessfuldelete,committheindexasfollows.

gptext.commit_index(index_name)

ExamplesDeletealldocumentscontainingtheword“unverified”inthedefaultsearchfield:

=#SELECT*FROMgptext.delete('wikipedia.public.articles','unverified');delete--------t(1row)=#SELECT*FROMgptext.commit_index('wikipedia.public.articles');commit_index--------------t(1row)

©CopyrightPivotalSoftware,Inc,2013-2017 67 2.1.2

Page 68: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Deletealldocumentsfromtheindex:

wikipedia=#SELECT*FROMgptext.delete('wikipedia.public.articles','*:*');delete--------t(1row)

wikipedia=#SELECT*FROMgptext.commit_index('wikipedia.public.articles');commit_index--------------t(1row)

gptext.drop_field()Removesafieldfromyourschema.

Syntax

gptext.drop_field(index_name,field_name)

Parameters

index_name

Thenameoftheindexfromwhichtodropthefield.Ifthetableispartitionedthismustbethenameoftheroottable.field_name

Thenameofthefieldtodrop.

Returntype

boolean

Privilegesrequired

OnlytheOWNERcanexecutethisfunction.

Remarks

Callthisfunctionforeachfieldyoudrop.

Beforeandafterdroppingoneormorefields,youmustreloadtheSolrconfigurationfileswithgptext.reload_index(),thencommittheindexwithgptext.commit_index().

Thecolumn __partition inindexesforpartitioneddatabasetablescannotbedropped.

Theinitial reload_index() isrequiredbySolr4.0behaviorandmaynotbenecessaryinsubsequentversions.

Example

Dropsthefield external_links fromtheindex.

©CopyrightPivotalSoftware,Inc,2013-2017 68 2.1.2

Page 69: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

=#SELECT*FROMgptext.reload_index('wikipedia.public.articles');reload_index--------------t(1row)

=#SELECT*FROMgptext.drop_field('wikipedia.public.articles','external_links');INFO:Dropfield:external_linksforindex:wikipedia.public.articlesdrop_field------------t(1row)

=#SELECT*FROMgptext.reload_index('wikipedia.public.articles');reload_index--------------t(1row)

=#SELECT*FROMgptext.commit_index('wikipedia.public.articles');commit_index--------------t(1row)

gptext.drop_index()Removesanindex.

Syntax

gptext.drop_index(index_name)

Parameters

index_name

Thenameoftheindextodrop.Ifthedatabasetableispartitioned,thismustbethenameoftheroottable.

Returntype

boolean

Privilegesrequired

OnlytheOWNERcanexecutethisfunction.

Remarks

Adroppedindexcannotberecovered.

Example

=#SELECT*FROMgptext.drop_index('wikipedia.public.articles');drop_index------------t(1row)

©CopyrightPivotalSoftware,Inc,2013-2017 69 2.1.2

Page 70: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

SearchSearchfunctionsenablequeryinganindex.

ChangingthequeryparseratquerytimeWhenusingthesearchfunctions,youcanchangethequeryparserusedbySolratquerytime.Adifferentqueryparsermayberequired,dependingonthenatureofthequery.SeetheUsingAdvancedQueryingOptionsforalistofthequeryparsersGPTextsupports.

Tochangethequeryparseratquerytime,usethe defType Solroptionwiththe gptext.search() function.

Tochangethequeryparserforanysearchfunctionatquerytime,usetheSolr localParams syntax,replacingthe <query> termwith '{!type=edismax}<query>'

.

WiththeGPTextUniversalQueryParser,youcanusefeaturesofanyofthesupportedqueryparsersinonequery.TousetheUniversalQueryParser,replacethe <query> termwith '{!gptextqp}<query>' .SeeUsingtheUniversalQueryParserforinformationandexamples.

gptext.search()Searchesanindex.

Syntax

gptext.search(src_tbl,index_name,search_query,filter_queries)

or:

gptext.search(src_tbl,index_name,search_query,filter_queries[,options])

Parameters

src_table

Specifiesa SELECT statementonanexisting,indexedtableonwhichtoperformthesearch.The src_table parameterisananytabledatatype,specifiedinthefollowingformat

TABLE(SELECT*FROM<src_table>)

index_name

Thenameoftheindextosearch.Ifthedatabasetableispartitionedyoucanspecifythenameofasub-partitiontabletosearch.search_query

TextvaluecontainingaSolrtextsearchquery.filter_queries

Atextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .options

Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.

Returntype

SETOFgptext.search_scored_result

Thisisacompositetypewiththefollowingcolumns:

Column Type

id text

score doubleprecision

©CopyrightPivotalSoftware,Inc,2013-2017 70 2.1.2

Page 71: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

hs(conditional) gptexthstore

rf(conditional) textColumn Type

The id columnisreturnedastext,evenifthe id_col specifiedinthe gptext.create_index() functionisanintegertype.Ifyouorderresultsby id orjoinsearchresultswiththeoriginaltableon id ,youmustcastthereturned id columntothecorrectintegertypeinyourquery.Forexample,thefollowingsearchqueryreturnsresultssortednumericallybythe id column:

SELECTid,scoreFROMgptext.search(table(select1scatterby1),'demo.public.test','*:*',null)ORDERBYid::INT8,2;

Ifthe options parameterisincludedin gptext.search() ,theresultincludesthe offsets column.Thiscolumncontainskey-valuepairs,wherethekeyisthecolumnnameandthevalueisacomma-separatedlistofoffsetstolocationswherethesearchtermoccurs.Thisdataisusedbythegptext.highlight()functiontoaddhighlightingtagstothecolumndata.Ifhighlightingisnotenabledwiththe 'hl=true' option,the offsets columnis NULL .

Ifthe fl optionisincludedinthe options parametertospecifyadditionalfieldstoaddtotheresult,the rf columncontainstheadditionalfieldsinaformattedtextvalue.Thegptext.gptext_retrieve_field()functioncanbeusedtoextractasinglefieldvaluefromthe rf column.Therearevariantsofthegptext.gptext_retrieve_field()functiontoretrieveintegerandfloatvaluesfromthe rf columnvalue.

Privilegesrequired

YoumusthavetheSELECTprivilegetoexecutethisfunction.

Solroptions

Solrqueriesallowthefollowingoptionalrefinements,specifiedasanampersand-delimitedlistinthe options parameter.

defType

Thenameofthequeryparsertouseforthisquery.

Example: defType=edismax

rows

Themaximumnumberofrowstoreturnpersegment.Ifomitted,allrowsarereturned.

Example: rows=100 returns100rowspersegmentorallrowsiftherearefewerthan100.

sort

Sortsonafieldorscoreinascendingordescendingorder.

Examples:

score desc (defaultifnosortdefined)

date_time asc

date_time asc score desc sortson date_time ascending,thenon score descending

start

Thenumberofthefirstrecordtoreturn.

Examples:

start=0 default:returnedrecordsstartswithrecord0

start=25 returnedrecordsstartswithrecord25

hl

Enablehighlighting.

Example: hl=true

hl.fl

Comma-separatedlistoffieldnamestoconsiderwhenhighlighting.

Examples:

©CopyrightPivotalSoftware,Inc,2013-2017 71 2.1.2

Page 72: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

hl.fl=field1

hl.fl=field1,field2

fl

Comma-separatedlistoffieldstoincludeinsearchresults.Thefieldsmustbesetto stored=true inthemanaged-schemafortheindex.

Example: fl=version,author

RemarksTheoutputincludesatablewithcolumns id (theIDnamedingptext.create_index())and score (the tf-idf score).Acolumnnamed offsets isincludedifhighlightingisspecifiedinthe options parameter.Acolumnnamed rf isincludedwhenalistofadditionalfieldstoincludeisspecifiedinthe options parameter.

Tochangethequeryparseratquerytime,specifythe defType optionintheoptionsparameterlist.Forexample,settingtheoptionsparameterto’rows=100&defType=edismax ’limitstheoutputto100rowspersegmentandwillchangethequeryparserto edismax .

The TABLE queryisplannedandaffectstheestimatefor gptext.search() ,butdoesnotexecute.Forexample,ifyourqueryincludes

gptext.search(TABLE(SELECT*FROMt),...)

thequeryplannerestimatesthenumberofresultsasthenumberofrowsin t .Thiscancausethequeryplannertoignoretheuseofanindexscan.Useaquerylike TABLE(SELECT1SCATTERBY1) toavoidthisissue.

Ifyoudonotspecifyoptions, gptext.search() returnsallrows.

Theoptionsseparatorhaschangedfromcommatoampersand(&)inordertosupporthighlighting.Ifyoudonotusehighlighting,youcanreverttousingthecommaseparatorbysettingthe gptext.search_param_separator to ',' .

TheSolroption rows specifiesthemaximumnumberofrowstoreturnpersegment.Forexample,ifyouhavefoursegments, ‘rows=100' returnsatmostatotalof400rows.Tolimitthenumberofrowsreturnedforanentirequery,seta LIMIT intheSQLquery.Forexample,thefollowingqueryreturnsatmost20rowsforthequery: selectt.id,q.score,t.message\_textfromtwitter.messaget,gptext.search(...)qwheret.id=q.id::int8LIMIT'20';

The gptexthstore typeisalimitedformofthePostgres hstore type.Itonlyhasthe hstore inputandoutputfunctionsimplemented,as gptext_hstore_in

and gptext_hstore_out .

ExamplesRunsaGPTextquerythatlooksforWikipediaarticlesthatcontaintheterm“optimization”,andjoinstheresultstotheoriginalGreenplumDatabasearticles table:

=#SELECTa.id,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT*FROMarticles),'wikipedia.public.articles','optimization',null)qWHEREa.id=q.id::int8ORDERBYscoreDESC;id|title|score----------+----------------+-------------43835553|Cubesort|0.05420493329050607|Cyclesort|0.0442581446508027|Treesort|0.03613662329352|Selectionsort|0.03433720420039|Mergesort|0.0335375933268249|Quicksort|0.03090701825977485|Bubblesort|0.02861433723954341|Timsort|0.02336350877355|Shellsort|0.02312869215205|Insertionsort|0.0228914713995|Heapsort|0.011179198(11rows)

Returns3resultsfromeachshardbeginningatrecord0:

©CopyrightPivotalSoftware,Inc,2013-2017 72 2.1.2

Page 73: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

=#SELECTa.id,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT*FROMarticles),'wikipedia.public.articles','optimization',null,'rows=3&start=0')qWHEREa.id=q.id::int8;id|title|score----------+----------------+-------------13995|Heapsort|0.01117919820039|Mergesort|0.0335375936508027|Treesort|0.03613662315205|Insertionsort|0.0228914777355|Shellsort|0.02312869223954341|Timsort|0.023363508(6rows)

Returns10rowswiththetext“iphone”highlightedinthe message column.

SELECTt.id,gptext.highlight(t.message,'message',s.hs)FROMtwitter.messaget,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message','{!gptextqp}iphone',null,'rows=10&hl=true&hl.fl=message')sWHEREt.id=s.id::int8;

gptext.search_count()Returnsthenumberofdocumentsthatmatchthesearchquery.

Syntax

gptext.search_count(index_name,search_query,filter_queries,options)

Parameters

index_name

Thenameoftheindex.search_query

Thesearchquery.filter_queries

Acomma-delimitedarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .options

Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.

Returntype

bigint

Privilegesrequired

YoumusthavetheSELECTprivilegetoexecutethisfunction.

Remarks

None.

Example

©CopyrightPivotalSoftware,Inc,2013-2017 73 2.1.2

Page 74: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

=#SELECT*FROMgptext.search_count('wikipedia.public.articles','bubble',null);count-------4(1row)

gptext.gptext_retrieve_field()Retrievesasinglefieldfromthe rf searchresultcolumnasatextvalue.

Syntax

gptext.gptext_retrieve_field(rf,field_name)

Parameters

rf

Thenameofthe rf column.field_name

Thenameofthefieldtoretrieve.

Remarks

The fl=<field_list> Solrsearchoptionisaddedtothe options parameterofthe gptext.search() functiontorequestadditionalstoredfields.Theadditionalfieldsarereturnedintheresultsinacolumnnamed rf .The rf columnvaluehasaformatlikethefollowing:

column_value{name:"_version"value:"1544714234398507008"}column_value{name:"revision"value:"9.70"}column_value{name:"author"value:"jdough"}

The gptext.gptext_retrieve_field() functionextractsthevalueforasinglespecifiedfieldandreturnsitasatextvalue.Ifthereisnofieldwiththespecifiednameinthe rf column,itreturnsNULL.

Storingadditionalfieldsinanindexrequiresediting managed-schema tospecifythefieldsthatshouldbestored.SeeStoringAdditionalFieldsinanIndexforinstructions.

gptext.gptext_retrieve_field_int()Retrievesasinglefieldfromthe rf searchresultcolumnasanintegervalue.

Syntax

gptext.gptext_retrieve_field_int(rf,field_name)

Parameters

rf

Thenameofthe rf column.field_name

Thenameoftheintegerfieldtoretrieve.

Remarks

©CopyrightPivotalSoftware,Inc,2013-2017 74 2.1.2

Page 75: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

The gptext.gptext_retrieve_field_int() functionisthesameasthegptext.gptext_retrieve_fieldfunction,exceptthattheextractedfieldvalueisconvertedtoanintegervalue.

gptext.gptext_retrieve_field_float()Retrievesasinglefieldfromthe rf searchresultcolumnasafloatvalue.

Syntax

gptext.gptext_retrieve_field_float(rf,field_name)

Parameters

rf

Thenameofthe rf column.field_name

Thenameofthefloatfieldtoretrieve.

Remarks

The gptext.gptext_retrieve_field_float() functionisthesameasthegptext.gptext_retrieve_fieldfunction,exceptthattheextractedfieldvalueisconvertedtoafloatvalue.

gptext.highlight()Highlightstermsbyinsertingmarkuptagsintodata.

Syntax

gptext.highlight(column_data,column_name,offsets)

Parameters

column_data

Thetextdatafromthetablewhichistobetaggedwithhighlightingtags.column_name

Thenameofthecorrespondingcolumnfromthetable.offsets

A gptext hstore valuethatcontainskey-valuepairsthatindicatethelocationsofthetexttohighlightwithinthecolumndata.SeeRemarksforinformationaboutthe gptext hstore datatype.

PrequisiteTousehighlighting,termvectorsmustbeenabledbeforecreatingtheindex.Toenabletermvectors,callgptext.enable_terms()foreachfieldwhereyouwanttoenableterms,thenindexorre-indexwithgptext.index().

RemarksThe offsets parameterisa gptexthstore ,whereeachkeyisacolumnnameandthevalueisacomma-separatedlistofoffsetsintothecolumndata.Thishstore isconstructedbygptext.search()withhighlightingenabledinthe offsets parameter.Followingisanexampleofthe offsets hstore content:

©CopyrightPivotalSoftware,Inc,2013-2017 75 2.1.2

Page 76: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

"field1"=>"0:5,9:14","field2"=>"13:20"

gptext.highlight() willinserttwosetsoftagsintothe field1 dataandonesetintothe field2 dataattheindicatedoffsets.

The gptexthstore typeisalimitedformofthePostgres hstore type.Ithasonlythe hstore inputandoutputfunctionsimplemented,as gptext_hstore_in

and gptext_hstore_out .

Thehighlighttagsaredefinedbythe gptext.hl_pre_tag and gptext.hl_post_tag serverconfigurationparameters.Theirdefaultvaluesare <em> and</em> ,respectively.

ExampleReturns10rowswiththetext“iphone”highlightedinthe message column.

SELECTt.id,gptext.highlight(t.message'message',s.hs)FROMtwitter.messaget,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message','{!gptextqp}iphone',null,'rows=10&hl=true&hl.fl=message')sWHEREt.id=s.id::int8;

FacetedSearchFacetingbreaksupasearchresultintomultiplecategories,showingcountsforeach.

gptext.faceted_field_search()The faceted_field_search() functionbreakssearchresultsintofieldnamecategories.

Syntax

gptext.faceted_field_search(index_name,query,filter_queries,facet_fields,facet_limit,minimum,options)

Parameters

index_name

Thenameoftheindex.query

Querystatement.Use *:* toqueryforallresults.filter_queries

Atextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .facet_fields

Anarrayoffieldnamestofacet.UsePostgreSQLarraynotation.facet_limit

Maximumnumberofresultstobereturnedforeachaggregation(facet).minimum

Minimumnumberofresultsrequiredbeforeanaggregation(facet)willbereturned.Enter0toreturnallfacets.options

Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.

Returntype

SETOFgptext.facet_field_result

Thisisacompositetypewiththefollowingcolumns:

©CopyrightPivotalSoftware,Inc,2013-2017 76 2.1.2

Page 77: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Column Type

field_name text

field_value text

value_count bigint

Privilegesrequired

YoumusthavetheSELECTprivilegetoexecutethisfunction.

Remarks

None.

ExamplesFaceton spam and truncated fields,nolimit,nominimum,alltweets:

SELECT*FROMgptext.faceted_field_search('twitter.public.message','*:*',null,'{spam,truncated}',0,0);

Faceton author_id ,nolimit,withaminimumoffivetweets,alltweets:

SELECT*FROMgptext.faceted_field_search('twitter.public.message','*:*',null,'{author_id}',0,5);

gptext.faceted_query_search()The faceted_query_search() functionbreakssearchresultsintocategoriesdefinedbyqueriesthatyouprovide.

Syntax

gptext.faceted_query_search(index_name,query,filter_queries,facet_queries,options)

Parameters

index_name

Thenameoftheindex.query

Querystatement.Use *:* toqueryforallresults.filter_queries

Atextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .facet_queries

Type:text[].Required.Anarrayoffacetqueries.options

Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.

Returntype

SETOFgptext.facet_query_result

Thisisacompositetypewiththefollowingcolumns:

Column Type

©CopyrightPivotalSoftware,Inc,2013-2017 77 2.1.2

Page 78: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

query_text text

value_count bigint

Column Type

Privilegesrequired

YoumusthavetheSELECTprivilegetoexecutethisfunction.

Remarks

None.

Example

Facetonproductpriceranges(0-100,101-200,201-300,300+)ofcameras:

SELECT*FROMgptext.faceted_query_search('store.public.catalog','product_type:camera',null,'{price:[*TO100],price:[101TO200],price:[201TO300],price:[301TO*]}');

gptext.faceted_range_search()The faceted_range_search() functionbreakssearchresultsintorangecategories,withrangesdefinedbythe range_start , range_end ,and range_gapparameters.

Syntax

gptext.faceted_range_search(index_name,query,filter_queries,field_name,range_start,range_end,range_gap,options)

Parameters

index_name

Thenameoftheindex.query

Querystatement.Use *:* toqueryforallresults.filter_queries

Antextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .field_name

Thenameofthefieldonwhichtofacet.range_start

Beginningoftherange.range_end

Endoftherange.range_gap

Sizeofrangeincrement,atextvalue.options

Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.

Returntype

SETOFgptext.facet_range_result

©CopyrightPivotalSoftware,Inc,2013-2017 78 2.1.2

Page 79: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Thisisacompositetypewiththefollowingcolumns:

Column Type

field_name text

range_value text

value_count bigint

Privilegesrequired

YoumusthavetheSELECTprivilegetoexecutethisfunction.

Remarks

None.

Example

Facetondaterange,starting1yearago,endingnow,everyday:

SELECT*FROMgptext.faceted_range_search('twitter.public.message','*:*',null,'created_at','NOW/YEAR-1YEAR','NOW/YEAR','+1DAY');

WorkingwithTerms

gptext.terms()GetsthetermvectorsfortheindexeddocumentsinaSolrindexforthespecifiedfield.Youcanuse gptext.terms() tocreatetables.

Syntax

gptext.terms(src_table,index_name,field_name,search_query,filter_queries[,options])

Parameters

src_table

An anytable valuethatspecifiesa SELECT statementonanexisting,indexedtableonwhichtoperformthesearch.Specifyintheformat:

TABLE(SELECT*FROM<src_table>;)

index_name

Thenameoftheindextoqueryforterms.field_name

Thenameofthefieldtoqueryforterms.search_query

Aquerythatthefieldmustmatch.filter_queries

Acomma-delimitedarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .options

Anoptional,comma-delimitedlistofSolrqueryparameters.

©CopyrightPivotalSoftware,Inc,2013-2017 79 2.1.2

Page 80: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Returntype

SETOFgptext.term_info

Thisisacompositetypewiththefollowingcolumns:

Column Type

id text

term text

positions integer[]

Privilegesrequired

YoumusthavetheSELECTprivilegetoexecutethisfunction.

RemarksToenableusing gptext.terms() ,executetheGPTextfunctiongptext.enable_terms(),thenreindexwithgptext.index().

The TABLE queryisplannedandaffectstheestimateofof gptext.terms() ,butdoesnotexecute.Forexample,ifyourqueryincludes:

gptext.terms(TABLE(SELECT*FROMt),...)

Thequeryplannerestimatesthenumberofresultsasthenumberofrowsin t .Thiscancausethequeryplannertoignoretheuseofanindexscan.Useaquerylike TABLE(SELECT1SCATTERBY1) toavoidthisissue.

Examples-CreateaTermsTable

CREATETABLEtwitter.termsASSELECT*FROMgptext.terms(TABLE(SELECT*FROMtwitter.message),'demo.twitter.message','message_text','iphone',null);

ConfigurationandMonitoringIndexconfigurationandmonitoringfunctionsenablemanagingindexes,trackingindexstatistics,checkingstatusofindexsegments,andensuringthatindexcontentsarecurrent.

gptext.cluster_status()ShowsSolrclusterstatus.

Syntax

gptext.cluster_status()

ReturnType

SETOFgptext.cluster_status_result

Thisisacompositetypewiththefollowingcolumns:

Column Type

index_name text

©CopyrightPivotalSoftware,Inc,2013-2017 80 2.1.2

Page 81: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

max_shards_per_node integer

router text

replication_factor integer

auto_add_replicas boolean

znode_version integer

config_name text

partitioned boolean

Column Type

gptext.index_status()Showsstatusofreplicasforaspecifiedindexorforallindexes.

Syntax

gptext.index_status([index_name])

Parameters

index_name

Thenameoftheindex.Optional.Returnsstatusforallindexesifnoindexisspecified.

ReturnType

SETOFgptext.index_status_result

Thisisacompositetypewiththefollowingcolumns:

Column Type

content_id smallint

index_name text

shard_name text

shard_state text

replica_name text

replica_state text

core text

node_name text

base_url text

is_leader boolean

partitioned boolean

gptext.partition_status()ListsindexesonpartitionedtablesorchildpartitionnamesinthecurrentGreenplumdatabase.

Syntax

gptext.partition_status([index_name])

©CopyrightPivotalSoftware,Inc,2013-2017 81 2.1.2

Page 82: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Parameters

index_name

Optional.Returnspartitionstatusforallindexesifnoindexisspecified.

ReturnType

SETOFgptext.partition_status_result

Thisisacompositetypewiththefollowingcolumns:

Column Type

partition_name text

inherits_name text

level integer

Example

Listpartitionstatusforanindex.

SELECT*FROMgptext.partition_status('demo.public.tweets');partition_name|inherits_name|level------------------------------+--------------------+-------demo.public.tweets_1_prt_1|demo.public.tweets|1demo.public.tweets_1_prt_2|demo.public.tweets|1demo.public.tweets_1_prt_3|demo.public.tweets|1demo.public.tweets_1_prt_4|demo.public.tweets|1demo.public.tweets_1_prt_5|demo.public.tweets|1demo.public.tweets_1_prt_6|demo.public.tweets|1demo.public.tweets_1_prt_7|demo.public.tweets|1demo.public.tweets_1_prt_8|demo.public.tweets|1demo.public.tweets_1_prt_9|demo.public.tweets|1demo.public.tweets_1_prt_10|demo.public.tweets|1...

Remarks

The gptext.partition_status() functioncanonlylisttheindexpartitionsfortablesinthecurrentGreenplumdatabase.

gptext.reload_index()ReloadsSolrconfigurationfilesiftheyhavebeenmodified.

Syntax

gptext.reload_index([index_name])

Parameters

index_name

Optional.Thenameoftheindexforwhichtoreloadtheconfigurationfiles.

Returntype

boolean

©CopyrightPivotalSoftware,Inc,2013-2017 82 2.1.2

Page 83: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Privilegesrequired

OnlytheOWNERcanexecutethisfunction.

Remarks

None.

Example

=#SELECT*FROMgptext.reload_index('wikipedia.public.articles');reload_index--------------t(1row)

gptext.version()ReturnstheversionofyourGPTextinstallation,includingbuildnumber.

Syntax

SELECT*FROMgptext.version()

Parameters

None.

Returntype

text

Privilegesrequired

Youdonotneedanyprivilegestoexecutethisfunction.

Remarks

None.

Example

=#SELECT*FROMgptext.version();version-----------------------------------------GreenplumTextAnalytics2.1.0(main)(1row)

HighAvailability

©CopyrightPivotalSoftware,Inc,2013-2017 83 2.1.2

Page 84: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

gptext.add_replica()Addsareplicaofanindexshard.

Syntax

gptext.add_replica(index_name,shard_name[,node_name])

Parameters

index_name

Nameoftheindex.Iftheindexisforapartitioneddatabasetable,thismustbethenameoftheroottable.shard_name

Nameoftheshardtoreplicate.node_name

Nameofthenodewherethereplicaistobeadded.Optional.Ifomitted,SolrCloudchoosesthenode.

Returntype

boolean

Remarks

ThisfunctionisusedbytheGPTextmanagementutility gptext-replicaadd .

Thevalueofthe gptext.replication_factor configurationparameterwhenanindexiscreateddetermineshowmanyreplicasarecreatedforeachshard.InaGreenplumsystem,therearethesamenumberofshardsasthereareGreenplumsegments.Thenumberofreplicascreatedforanewindexisthenumberofsegmentstimesthevalueofthe gptext.replication_factor configurationparameter,2bydefault.ThereplicasaredistributedevenlyamongtheliveGPTextnodes.

Replicasconsumespaceonthehostwheretheyarecreated,sotheyareusuallyonlycreatedtoreplaceareplicathathasfailedorbecomeunavailableortorelocateareplicatoanotherGPTextinstance.Whenaddingreplicas,youshouldmaintainequaldistributionofreplicasamongtheGPTextnodesandavoidplacingmultiplereplicasforthesameshardonthesamehost.

ThetotalnumberofreplicasforanindexthatcanbeplacedoneachGPTextnodeissetwhentheindexiscreated.(InSolr,thisistheMaxShardsPerNodeparameter.)GPTextsetsthislimitbycalculatingthenumberofreplicastocreatepernodeandaddinganadditionalfactor,specifiedinthegptext.extension_factor serverconfigurationparameter.Thisparametercanbesetbetween0and10;thedefaultvalueis2.Sincethelimitissetwhentheindexiscreated,itisrecommendedtosetthe gptext.extension_factor parametertoahighernumbertoallownewreplicastobecreatedwhennecessary.

Example

=#SELECT*FROMgptext.add_replica('wikipedia.public.articles','shard1');success|core_name---------+-------------------------------------------t|wikipedia.public.articles_shard1_replica4(1row)

gptext.delete_replica()Deletesanamedreplicafromthespecifiedindexandshard.

Syntax

gptext.delete_replica(index_name,shard_name,replica_name[,only_if_down])

©CopyrightPivotalSoftware,Inc,2013-2017 84 2.1.2

Page 85: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Parameters

index_name

Nameoftheindex.shard_name

Nameoftheshardthatcontainsthereplicatodelete.replica_name

Nameofthereplicatoremove.only_if_down

Optional.Whentrue,noactionistakenifthereplicaisactive.Defaultisfalse.

Returntype

boolean

Remarks

Usethegptext.index_status()functiontofindthenameofthereplicatodrop.Namesareintheformat core_nodeX ,where X isanumber.

Thisfunctioniscalledfromthe gptext-replicadrop managementutility.

Examples1. Deletethe core_node5 replicaifitisdown.

=#SELECT*FROMgptext.delete_replica('wikipedia.public.articles','shard1','core_node5',true);ERROR:Deletereplicafailed:Attemptedtoremovereplica:wikipedia.public.articles/shard1/core_node5withonlyIfDown='true',butstateis'active'.CONTEXT:SQLfunction"delete_replica"statement1

2. Deletethe core_node5 replicaevenifitisactive.

=#SELECT*FROMgptext.delete_replica('wikipedia.public.articles','shard1','core_node5');success---------t(1row)

GeneralPurposeFunctions

gptext.count_t()Countsthenumberofrecordsinatable.

Syntax

gptext.count_t(table_name)

Parameters

table_name

Nameofthetableforwhichtocountrecords.

Returntype

©CopyrightPivotalSoftware,Inc,2013-2017 85 2.1.2

Page 86: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

integer

Privilegesrequired

YouneedSELECTprivilegeson table_name toexecutethisfunction.

Example

=#SELECT*FROMgptext.count_t('wikipedia.public.articles');count_t---------11(1row)

©CopyrightPivotalSoftware,Inc,2013-2017 86 2.1.2

Page 87: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

GPTextManagementUtilitiesManagementutilitiesareGPTextcommand-lineutilitiesthatareusedtomanagetheGPTextcluster.TheutilitiesmustberunontheGreenplummasterasthegpadminuser.

ToensuretheGPTextcommand-lineutilitiescanbefoundonthepath,sourcetheGreenplumDatabaseandGPTextenvironmentscripts.TheGreenplumDatabaseenvironmentmustbesetbeforeyousourcetheGPTextenvironmentscript.Forexample,ifbothGreenplumDatabaseandGPTextareinstalledinthe/usr/local/directory,enterthesecommands:

$source/usr/local/greenplum-db-/greenplum_path.sh$source/usr/local/greenplum-text-/greenplum-text_path.sh

HelpTogethelpforautility,specifytheflag -h or --help .Ashorthelpmessagedisplayswithalistofparameters.

DebuggingTogetverboseoutputfordebuggingautility,specifytheflags -v or --verbose .

GPTextUtilitiesgptext-backup–backsupaGPTextindextoasharedfilesystem.

gptext-config–performsGPTextconfigurationoptions.

gptext-expand–addsnewGPTextnodestoexistinghostsinthecluster.

gptext-installsql–installsorremovesthegptextschemaanduser-definedfunctionsinGreenplumdatabases.

gptext-migrator-installstheGPTextbinariesintoanupgradedGreenplumDatabasesystem.

gptext-recover–restartsGPTextnodesthataredown.

gptext-replica–addsordropsareplicaofanindexshard.

gptext-restore–restoresaGPTextindexfromabackuponasharedfilesystem.

gptext-start–startsorrestartstheGPTextcluster.

gptext-state–displaythestateoftheGPTextclusterandindexes.

gptext-stop–shutsdowntheGPTextcluster.

gptext-uninstall–uninstallsGPText,includingdataandinstalledfiles,andZooKeepernodesiftheywereinstalledwiththeGPTextinstaller.

gptext-upgrade-upgradesaGPTextsystemtoanewGPTextversion.

zkManager–checkstheZooKeeperclusterstate.IfZooKeeperwasinstalledwithGPText, zkManager canstartorstoptheZooKeepercluster.

gptext-backupBacksupaGPTextindextoasharedfilesystem.

Syntax

gptext-backup-h

gptext-backup-p<path>-i<index>-n<name>[-v]

Parameters

-h

©CopyrightPivotalSoftware,Inc,2013-2017 87 2.1.2

Page 88: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

--help Displaysausagemessageandexits.

-ppath

--pathpath

Thepathwherethesharedfilesystemismountedoneachhost.Thefilesystemmustbeaccessiblefromallhostsintheclusterandreadableandwritablebythegpadminuser.

-iindex_name

--indexindex_name ThenameoftheGPTextindextobackup.

-nbackup_name

--namebackup_name Anameforthebackup.

Example

gptext-backup-imyindex-p/mnt/backupfs/gptext-backups-nmybackup

NotesYoucanbackupanindexsothatyoucanrestoreittoadifferentGPTextsystemoravoidhavingtoreindexiftheexistingindexbecomescorrupted.

ThesharedfilesystemmustbemountedonallhostswithGPTextnodesandmustbewritablebythegpadminuser.Thefilesystemcouldbe,forexample,anNFSmountoraSSHserverwithsshfssupport.Thefilesystemmustbeconfiguredandaccessiblebeforeyouexecutethe gptext-backup utilityandabletoacceptconnectionsfromeachhostinthecluster.

The gptext-backup utilitycreatesanewsubdirectoryatthespecifiedpathwiththebackupnamespecified.Thecommandfailsifthedirectoryalreadyexists.

Whenthebackupiscomplete,thebackupdirectorycontainsthefollowing:

backup.infoAtextfilecontainingthreecomma-separatedstrings:thedatabasename,schemaname,andindexnamefortheindexthatwasbackedup.

backup.propertiesAtextfilewithpropertiesthatdescribethebackup,suchasthedateandtimethebackupstarted,thenameofthebackup,andthenamesoftheSolrcollectionandcollectionconfiguration.

zk_backupAdirectorycontainingthefollowing:

collection_state.json –aJSONfiledescribingthestatusoftheSolrcollection.

configs/<collection-name>/ –adirectorycontainingcopiesoftheSolrconfigurationfilesstoredinZooKeeperfortheindex,forexample managed-

schema , solrconfig.xml , protwords.txt , stopwords.txt .

snapshot.shard0…snapshot.shard_N_Adirectoryforeachshard,withthefilescontainingcontentoftheshard.

Ifthebackupfails—forexampleifthereisinsufficientdiskspace—anerrormessageisdisplayed,butthebackupdirectoryisnotremoved.Besuretoremovethebackupdirectorybeforerestartingthebackup.

gptext-configPerformsGPTextconfigurationtasks:

Edit,add,oruploadconfigurationfilesinZooKeeper

RevertconfiguredfilesinZooKeeper

EditJVMconfigurationoptions

©CopyrightPivotalSoftware,Inc,2013-2017 88 2.1.2

Page 89: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

UploadjarfilestotheGPTexthomedirectory

Syntax

gptext-config-h|--help

gptext-config-ffile_name-iindex_name[-r][-e][-b]

gptext-config-aappend_file-ffile_name-iindex_name

gptext-config-upath/local_file_name-fpath/zookeeper_file_name-iindex_name

gptext-config-jpath/jar_file

gptext-config-ojvm_options

gptext-config-ffile_name-iindex_name-afile_to_append

gptext-config-ksolr_key-s<solr_value>

Parameters

-h

--help Displaysausagemessageandexits.

-iindex-name

--index=index-nameNameoftheindex.Iftheindexisforapartitionedtable,youmustspecifytherootpartitionname.

-ffilename

--file=filename

Thenameofafiletoedit,append,orupload.The -i optionmustbeincludedtospecifytheindex.Thefollowingfilesaresupported:

solrconfig.xml –ContainsmostoftheparametersforconfiguringSolritself(seeConfiguringsolrconfig.xml attheApacheSolrwebsite).

schema.xml –DefinestheanalysischainsthatSolrusesforvariousdifferenttypesofsearchfields(seeSettingupTextAnalysisChains).

stopwords.txt –Listswordsyouwanttoeliminatefromthefinalindex.Youcanalsoeditlanguagespecificstopwordsbyspecifyingafilenameintheformat stopwords_language_code.txt ,wherelanguage_code isatwo-charactercodesuchas en , fr ,or es .

protwords.txt –Listsprotectedwordsthatyoudonotwanttobemodifiedbytheanalysischain.Forexample,iPhone.

synonyms.txt –Listswordsthatyouwantreplacedbysynonymsintheanalysischain.

emoticons.txt –Definesemoticonsforthe text_sm socialmediaanalysischain(seegptext-start).

currency.txt –Definesexchangeratesbetweenonecurrencyandanother(seeWorkingwithCurrenciesandExchangeRates attheApacheSolrwebsite).

jar_file–thenameofajarfiletouploadtoGPText_Install_Directory/lib/.

-ecommand

--editor=command

Editortouse.Choicesareanyeditorthattakesafilenameonthecommandlineasaparameter.Forexample,vi,vim,emacs,nano,etc.Ifabsent,viisused.

-cstats-list

--stats_columns=stats-list

Usedwiththe -i or --index option,specifiesacomma-separatedlistofstatisticstodisplay.Thelistmaycontain replication_factor ,max_shards_per_node , num_docs ,and size_in_bytes .Ifno -c or--stats_columns optionissupplied,allfourstatisticsaredisplayed.

©CopyrightPivotalSoftware,Inc,2013-2017 89 2.1.2

Page 90: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

-afilename

--append=filename

Appendsanamedfiletoaconfigurationfileanddistributestheresultingfiles.Requiresthe -f and -i parameters. -f namestheconfigurationfiletowhichyouwanttoappendthefilenamed(includinglocalpath)withthe -a parameter.

-r

--revert

filename

Revertnamedfiletopreviousversion.

-bn

--batch_size=nHowmanySolrinstancestoconfigureconcurrently.Thedefault(64)isgenerallymorethanenough.Alargernumbermayincreasespeed.

-ulocal_file_path

--uploadlocal_file_path

Uploadaconfigurationfileatlocal_file_pathtoZooKeeper.SpecifythedestinationZookeeperfilenamewiththe -f optionandspecifytheindexnamewiththe -i option.

-jjarfile

--jar=jarfile UploadsajarfiletoGPText_Install_Directory/lib/.

-o“JVM_Options”ModifiesJVMoptions.ToensurethattheJVMsarerestartedafterchangingJVMoptions,restarttheGPTextclusterusingthe gptext-stopand gptext-start utilities.

-k

--solr_prop=

-s

---solr_val=

Setsacustompropertyinthe solr.xml configurationfile.The-koptionspecifiesthenameoftheproperty.The -s optionspecifiesthevalue.Currently,theonlycustomGPTextpropertyisthe trackCommitproperty,whichenablesordisablescommittracking.

NotesUsethe gptext-config utilitytoedittheconfigurationfilesforaspecifiedindex.

gptext-config automaticallyreindexesaftereditingfilesiftheconfigurationchangesmaderequireit.

Ifyouusethe -f ( --file )parametertoeditoneoftheindexconfigurationfiles,GPTextautomaticallyplacestheeditedfileinitsproperdirectory.

Tomoveanindexconfigurationfilefromthelocalfilesystemtotheindexconfigurationdirectoryinallofthesegments,specifythelocalfilewiththe -u (--upload )optionandthedestinationfilewiththe -f (–file`)option.

The -k ( --solr_prop )and -s ( --solr_val )parametersmustalwaysbeusedtogether.TheysetacustomGPTextpropertyinthe solr.xml configurationfile.Currently,theyareonlyusedtoenableordisablecommittracking.Changesto solr.xml donottakeeffectuntiltheGPTextinstanceisrestarted.

Examples1. Editthe managed-schema fileinindex wikipedia.public.articles ,usingthevieditor:

gptext-config-fmanaged-schema-iwikipedia.public.articles-evi

2. Appendthefile stopwords.add to stopwords.txt inindex wikipedia.public.articles :

gptext-config-astopwords.add-fstopwords.txt-iwikipedia.public.articles

Nevereditthetemplateconfigurationfiles.Ifyoudo,everyindexyoucreateaftereditingthetemplateswillbecreatedwithyourmodifiedversions.Usethe gptext-config utilitytoensurethatyouareeditingtheconfigurationfilesforyourindex,ratherthanthetemplateconfigurationfiles.

©CopyrightPivotalSoftware,Inc,2013-2017 90 2.1.2

Page 91: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

3. Revertfile managed-schema inindex wikipedia.public.articles aftereditingit.

gptext-config-fmanaged-schema-iwikipedia.public.articles-r

4. Uploadthelocalfile custom.txt totheZooKeeperfile custom.conf inindex wikipedia.public.articles :

gptext-config-ucustom.txt-fcustom.conf-iwikipedia.public.articles

5. Uploadjarfile text.jar tothe lib directoryintheGPTexthomedirectory:

gptext-config-jtext.jar

6. SetJVMoptions:

gptext-config-o"-Xms256M-Xmx400M"

gptext-expandExpandsaGPTextclusterbyaddingnewGPTextnodestoexistinghostsinaGPTextclusterortohostsaddedbytheGreenplumDatabase gpexpandmanagementutility.ReplicasforindexescreatedafterthenewGPTextnodesareaddedwillbedistributedacrossthenewandexistingnodes.Documentsmustbereindexedtorebalancereplicasonexistinghostsor,afterexpandingtheGreenplumcluster,toredistributetheindextonewshards.

Synax

gptext-expand-h

gptext-expand-e-p<paths>[-d<database>][-v]

gptext-expand-H<new-hosts>[-d<database>][-v]

Parameters

-h

--help Displaysausagemessageandexits.

-e

--existingAddsGPTextnodestoexistinghostsintheGPTextcluster.The`-p`optionmustalsobesuppliedtospecifythedatadirectoriesforthenewnodes.

-p

--expand_paths

SpecifiespathstodirectorieswherethenewGPTextnodes’datadirectoriesaretobecreated.ThesedirectoriesshouldbeparalleltotheGreenplumDatabasesegmentdatadirectories.Ifthereismorethanonedirectory,placetheminacomma-delimitedlist,forexample-p /data1/nodes,/data1/nodes,/data2/nodes .Requiredwhenexpandingonexistinghosts.

-H

--new_hosts

SpecifiesthenewhostsonwhichGPTextistobeinstalled.Placemultiplehostnamesinacomma-delimitedlist,forexample-H host1,host2,host3 .SeeNotesforrequirementsfornewhosts.

-d

--database SpecifiesthenameofadatabasecontainingGPTextschema.

-v

--verbose Displaysdebugoutput.

©CopyrightPivotalSoftware,Inc,2013-2017 91 2.1.2

Page 92: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

NotesThe -p and -d optionscannotbeusedtogether.

WhennewhostsareaddedtotheGreenplumDatabasecluster,ensurethatthefollowingGPTextprerequisitesareinstalledbeforerunning gpexpand :

OracleJava1.8Python2.6orgreaterLinux lsof utilityAllhostsintheclustermustbeabletoreachthenewandexistinghosts.

Existingreplicasarenotautomaticallyredistributed.TorebalancereplicasamongtheexpandedGPTextcluster,youmustreindex.

Whenexpandingtonewhosts,youmustreindextoredistributetheindexamongexistingandnewshards.

gptext-installsqlInstallsorremovesthegptextschemaanduser-definedfunctionsindatabases.

Syntax

gptext-installsql-h

gptext-installsql[-c][-v]db_name[db2_name...]

Parameters

-c

--clean RemovesthegptextschemaandUDFsfromthespecifieddatabases.

-h

--helpDisplaysausagemessageandexits.

-v

--verbose Displaysdebugoutput.

NotesThe gptext schemaisreservedforusebyGPText.The gptext-installsql utilitydropsandrecreatestheschema.IfyouaddanydatabaseobjectstotheschematheywillbelostwhenyoureinstalltheschemaorupgradetheGPTextsystem.

Examples1. InstallGPTextUDFsindatabases wikipedia and twitter .

$gptext-installsqlwikipediatwitter20160603:14:03:53:305130gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Creating'gptext'schemaandUDFsindatabasewikipedia...20160603:14:03:53:305130gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Creating'gptext'schemaandUDFsindatabasetwitter...20160603:14:03:54:305130gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Validatinggptextinstallation20160603:14:03:59:305130gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Done.

2. DeleteGPTextUDFsindatabase wikipedia .

©CopyrightPivotalSoftware,Inc,2013-2017 92 2.1.2

Page 93: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

$gptext-installsql-cwikipedia20160603:14:03:04:304847gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Connectingtodatabasewikipedia20160603:14:03:05:304847gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Dropping'gptext'schemaandUDFs...20160603:14:03:05:304847gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Validatingcleanoperation20160603:14:03:09:304847gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Done.

gptext-migratorMigratesthecurrentGPTextsystemintoanupgradedGreenplumDatabasecluster.

Syntax

gptext-migrator[-h|--help]

gptext-migrator[-v|--verbose]

NotesThe gptext-migrator utilityrelocatesthecurrentGPTextsystemtoanewGreenplumDatabaserelease.

TheutilitydeterminesthedestinationGreenplumDatabasereleasefromtheenvironment.IftheGPTextsystemhasalreadybeenmigrated,orifthedestinationGreenplumreleaseisunsupported, gptext-migrator outputsamessageandquits.

IfyouareupgradingGPTextandGreenplumDatabaseatthesametime,completetheGreenplumDatabaseupgradefirst,andthenuse gtext-migrator toaddthecurrentGPTextversiontothenewGreenplumDatabaseinstallation.Finally,use gptext-upgrade toupgradethesystemtothenewGPTextversion.

gptext-recoverRecoversGPTextnodes.

Syntax

gptext-recover-h

gptext-recover-f[-v]

gptext-recover-H<new-host1>,<new-host2>,...[-v]

gptext-recover-r[-v]

Parameters

-h

--help Displaysausagemessageandexits.

-f

--force

ForcesrecoveryforanyGPTextnodesthataredown.Ifthenodeisunrecoverable,deletesthenode,createsanewnode,andrecreatesreplicas.

-H

--new-hostsRecoverdownnodesonnewhosts.Forexample“host1,host2”.Thenumberofnewhostsmustbeequaltothenumberoffailedhosts.

-r

©CopyrightPivotalSoftware,Inc,2013-2017 93 2.1.2

Page 94: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

--index_replicas Recoverreplicas,butdonotrecoveranydownnodes.

-v

--verbose Displaysdebugoutput.

NotesThe -f and -H optionscannotbeusedatthesametime.

Ifshardsaredown, gptext-recover advisesyoutoreindex.

Ifnoshardsaredown, gptext-recover restoresanyreplicasthataredown.

IfanyGPTextnodesrecoveredusingthe -f or -H optionsfailtostart,thereplicascannotberecovered.Ifthisshouldhappen,resolvethestartupproblemwiththenewlycreatednodes,andthenrecoverthereplicasusingthe gptext-recover-

roption.ItisimportanttorecoverreplicaswhenallGPText

nodesarehealthysothatreplicaswillbedistributedevenlyamongthenodes.

gptext-replicaAddordeleteareplicaforanindexshard.

Syntax

gptext-replica-h

gptext-replicaadd-iindex-name-sshard[-nnode]

gptext-replicadrop-iindex-name-sshard-rreplica[-o]

Parameters

-h

--help Displaysausagemessageandexits.

-iindex-name

--index=index-name Nameoftheindex.

-ffilename

--file=filename

Thenameofafiletoedit,append,orupload.The -i optionmustbeincludedtospecifytheindex.Thefollowingfilesaresupported:

solrconfig.xml –ContainsmostoftheparametersforconfiguringSolritself(see[https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml](http://wiki.apache.org/solr/SolrConfigXml)).

schema.xml –DefinestheanalysischainsthatSolrusesforvariousdifferenttypesofsearchfields(seeSettingupTextAnalysisChains).

stopwords.txt –Listswordsyouwanttoeliminatefromthefinalindex.Youcanalsoeditlanguagespecificstopwordsbyspecifyingafilenameintheformat stopwords_language_code.txt ,where language_code isatwo-charactercodesuchas en , fr ,or es .

protwords.txt –Listsprotectedwordsthatyoudonotwanttobemodifiedbytheanalysischain.Forexample,iPhone.

synonyms.txt –Listswordsthatyouwantreplacedbysynonymsintheanalysischain.

©CopyrightPivotalSoftware,Inc,2013-2017 94 2.1.2

Page 95: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

emoticons.txt –Definesemoticonsforthe text_sm socialmediaanalysischain.Seegptext-start.

currency.txt –Definesexchangeratesbetweenonecurrencyandanother(seeWorkingwithCurrenciesandExchangeRates attheApacheSolrwebsite).

jar_file–thenameofajarfiletouploadtoGPText_Install_Directory/lib/.

-ecommand

--editor=commandEditortouse.Choicesareanyeditorthattakesafilenameonthecommandlineasaparameter.Forexample,vi,vim,emacs,nano,etc.Ifabsent,viisused.

-afilename

--append=filename

Appendsanamedfiletoaconfigurationfileanddistributestheresultingfiles.Requiresthe -f and -i parameters. -f namestheconfigurationfiletowhichyouwanttoappendthefilenamed(includinglocalpath)withthe -aparameter.

-r

--revert

filename

Revertnamedfiletopreviousversion.

-iindex

--index=index Required.Thenameoftheindex.

-sshard

--shard=shard Required.Thenameoftheshardtoaddareplicato.

-nnode

--node=node Optional.Thenodewherethereplicaistobeadded.

-rreplica

--replica=replica Requiredforthedropcommandonly.Thenameofthereplicatodrop.

-o

--onlyifdown Optional.Usedonlywiththedropcommand.Onlydropthereplicaifit’sdown.

NotesTofindthenameofareplicatodrop,check gptext.index_status() .Thenameis core_nodeX whereXisanumber.

Examples1. Addareplicaforindex wikipedia.public.articles inshard shard0 ,onnode node1 .

gptext-replicaadd-iwikipedia.public.articles-sshard0-nnode1

2. Dropthereplicanamed core_node1 forindex wikipedia.public.articles inshard shard0 ifthereplicaisdown.

gptext-replicadrop-iwikipedia.public.articles-sshard0-rcore_node3-o

gptext-restoreRestoreaGPTextindexfromabackupcreatedonasharedfilesystemwiththe gptext-backup utility.

©CopyrightPivotalSoftware,Inc,2013-2017 95 2.1.2

Page 96: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

Syntax

gptext-restore-h

gptext-restore-i<index_name>-p<path>[-v]

Parameters

-h

--help Displaysausagemessageandexits.

-ppath

--pathpath Thepathtothebackupdirectoryoneachhost.

-iindex_name

--indexindex_nameThenameoftheGPTextindextorestore.TheindexmustnotalreadyexistinthetargetGPTextsystem.

Example

gptext-restore-imyindex-p/mnt/backupfs/gptext-backups/mybackup

NotesUsethe gptext-restore utilitytorestoreaGPTextindexbackupfromasharedfilesystem.YoucanrestorethebackuptoanewGPTextsystemorrestoreabackuptorecoveracorruptedGPTextindex.

Theindexyouarerestoringmustnotexist.Ifyouarerestoringanindextorecoveracorruptedindex,youmustfirstdeletetheexistingindexwiththegptext.delete() UDF.The gptext-restore utilitycreatesanewindexandwilloutputanerrorandquitiftheindexyouarerestoringexists.

gptext-startStartsorrestartstheGPTextcluster.

Syntax

gptext-start-h

gptext-start[-r][-s][-v]

Parameters

-h

--help Displaysausagemessageandexits.

-r

--restart RestartstheGPTextcluster.

-s

©CopyrightPivotalSoftware,Inc,2013-2017 96 2.1.2

Page 97: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

--slow_start RestartstheGPTextclusterbystartingindexesoneatatime.

-v

--verbose Displaysdebugoutput.

NotesThe gptext-start-

rcommandcallsthe solr

restartcommandtostopandrestartalloftheSolrinstancesinthecluster.TheGPTextutilitydeterminesifthe

processesarerunningbeforeitcompletes,butitcannotverifythatalloftheSolrprocesseswerestopped.IfitisimportanttobecertainthatSolrprocesseswerestopped,forexampleifyouhavechangedtheJVMoptions,use gptext-stop followedby gptext-start insteadof gptext-start-

r.

The -s ( --slow-start )optionisrecommendedifyouhavealargenumberofindexes.Bydefault,whenaSolrclusterstartsallofthecluster’sindexesarestartedatonce.Withalargenumberofindexes,thenumberofinitialZooKeeperrequestscanresultintimeouterrorsandpossiblefailuretostarttheclustersuccessfully.Withthe -s option,GPTextperformsarollingstart,startingindexesoneatatime,toreduceZooKeepercontentionandallowamorestablestartup.Ifyouhavemorethan50indexesanddonotspecifythe -s option, gptext-start displaysawarningmessageandrequiresyoutoconfirm.Withthe -s option, gptext-start doesnotreturnuntilallindexeshavebeenstarted;withoutthe -s option,the gptext-start commandreturnsimmediately.

Examples1. StarttheGPTextcluster.

gptext-start

2. RestarttheGPTextcluster.

gptext-start-r

gptext-stateDisplaysthestateoftheGPTextclusterandGPTextindexes.

Syntax

gptext-state-h

gptext-state[-d<db-name>][-D][-v]

gptext-state-i<index-name>[-d<db-name>][-c<col1,...>][-v]

gptext-statelist[-d<db-name>][-v]

gptext-statehealthcheck[-d<db-name>][-f<percent>][-v]

Parameters

-h

--help Displaysausagemessageandexits.

-ddb-name

--database=db-name

ThenameofadatabasecontainingtheGPTextschema.

gptext-state searchesalldatabasesforthefunctionsitneedstorun.Iftheuserdoesnothaveaccesspermissiontothedatabaseitbeginswith,it

©CopyrightPivotalSoftware,Inc,2013-2017 97 2.1.2

Page 98: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

fails.Inthiscase,usethe --database= parametertospecifyanaccessibledatabasetosearch.

-D

--detailsListthestatusforeachGPTextindex.Whenomitted, gptext-state listscountsofthenumbersofindexeswithGreen,Yellow,andRedstatuses.

-iindex-name

--index=index-name

Thenameofanindex.Displaysstatisticsforthespecifiedindex.Iftheindex-nameisarootorchildpartition,displaysanyparentorchildpartitions.Thisoptioncannotbeusedwiththe list or healthchecksubcommands.

-ccolumn-list

--stats_columns=

Usedwiththe -i or --index option,specifiesacomma-separatedlistofstatisticstodisplay.Thelistmaycontain replication_factor ,max_shards_per_node , num_docs ,and size_in_bytes .Ifno -c or--stats_columns optionissupplied,allfourstatisticsaredisplayed.

-fdiskfree

--disk_free=diskfreeUsedwiththe healthcheck command,specifiesthepercentagediskfreerequiredperhosttoreportahealthyGPTextcluster.Thedefaultis10.

NotesAllparametersareoptional,exceptthat -i ( --index )isrequiredwhenyouspecify --c ( --stats_columns ).

Ifyouspecifyasubpartitionnamewiththe -i option, gptext-state displaysthenameoftheparenttableorpartitionfromwhichthepartitioninherits.Ifyouspecifythenameofatableorpartitionwithchildpartitions, gptext-state liststhem.

Whenexecutedwithnoarguments, gptext-state displaystheGPTextversionandcountsofindexesintheGreen,Yellow,andRedstates.

AGreenstatemeansthatallshardsandreplicasarehealthy.

AYellowstatemeansthatallshardsareavailable,butoneormorereplicasisdown.

ARedstatemeansthatonemoremoreshardsisdown.

Withthe -D ( --details )optionspecified, gptext-state listsallGPTextindexeswiththecolumns database , index_name ,and state .The state columndisplaysthestatusoftheindexas Green , Yellow ,or Red .

IfanyindexhasaYelloworRedstatus, gptext-state returnsanon-zerovalue.

Examples1. ShowtheGPTextclusterstate.

$gptext-state20161216:14:01:32:029224gptext-state:gpsne:gpadmin-[INFO]:-Checkzookeeperclusterstate...20161216:14:01:32:029224gptext-state:gpsne:gpadmin-[INFO]:-CheckGPTextclusterstatus...20161216:14:01:33:029224gptext-state:gpsne:gpadmin-[INFO]:-CurrentGPTextVersion:2.0.020161216:14:01:33:029224gptext-state:gpsne:gpadmin-[INFO]:-Allnodesareupandrunning.20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:-Indexstate.20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:-stateindexcount20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:-Green4

2. ShowtheGPTextclusterstatewithdetails,specifying wikipedia asadatabasecontainingtheGPTextschema.

$gptext-state-D-dwikipedia20160603:13:54:27:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextclusterstatus...20160603:13:54:27:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CurrentGPTextVersion:2.0.020160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Allnodesareupandrunning.20160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:------------------------------------------------20160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Indexstate.20160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:------------------------------------------------20160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-databaseindex_namestate20160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-wikipediawikipedia.public.articlesGreen20160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-gptextdocgptextdoc.public.docsGreen

©CopyrightPivotalSoftware,Inc,2013-2017 98 2.1.2

Page 99: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

3. Show replication_factor and num_docs statisticsfortheGPTextindex wikipedia.public.articles .Specify wikipedia asthedatabasewiththeGPTextschema.

$gptext-state-iwikipedia.public.articles-creplication_factor,num_docs-dwikipedia20160603:13:57:16:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextclusterstatistics...20160603:13:57:18:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:-ReplicasUp:620160603:13:57:18:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:------------------------------------------------20160603:13:57:18:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Indexwikipedia.public.articlesstatistics.20160603:13:57:18:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:------------------------------------------------20160603:13:57:18:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:-replication_factornum_docs20160603:13:57:18:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:-311

4. Listallindexes.

$gptext-statelist20160603:13:58:10:303550gptext-state:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------------20160603:13:58:10:303550gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Indexlist20160603:13:58:10:303550gptext-state:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------------20160603:13:58:10:303550gptext-state:gpdb-sandbox:gpadmin-[INFO]:-gptextdoc.public.docs20160603:13:58:10:303550gptext-state:gpdb-sandbox:gpadmin-[INFO]:-wikipedia.public.articles

5. Performahealthcheckwitha20%freediskrequirement.

$gptext-statehealthcheck-f20-dwikipedia20160603:13:58:56:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-ExecutehealthcheckonGPTextcluster!20160603:13:58:56:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextconfigfiles...20160603:13:58:56:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160603:13:58:56:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextindexstatus...20160603:13:58:56:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160603:13:58:56:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforrequireddiskspace...20160603:13:58:57:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160603:13:58:57:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforrequireduserprivileges...20160603:13:58:57:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160603:13:58:57:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforindexesanddatabaseconsistency...20160603:13:58:58:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160603:13:58:58:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Done.

6. Checkthestatusofapartitionedtable.

$gptext-state-idemo.public.tweets20170427:07:22:45:014741gptext-state:mdw:gpadmin-[INFO]:-ExecuteGPTextstate...20170427:07:22:45:014741gptext-state:mdw:gpadmin-[INFO]:-Checkzookeeperclusterstate...20170427:07:22:45:014741gptext-state:mdw:gpadmin-[INFO]:-CheckGPTextclusterstatistics...20170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-ReplicasUp:420170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:------------------------------------------------20170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-Indexdemo.public.tweetsstatistics.20170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:------------------------------------------------20170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-replication_factormax_shards_per_nodenum_docssizeinbytes20170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-2430794448507520170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-Childpartitionindexes:20170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_120170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_220170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_320170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_420170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_520170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_620170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_720170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_820170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_920170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_10...

gptext-stopStoptheGPTextclusternodes.

Syntax

©CopyrightPivotalSoftware,Inc,2013-2017 99 2.1.2

Page 100: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

gptext-stop-h

gptext-stop[-v][-f]

Parameters

-h

--help Displaysausagemessageandexits.

-v

--verbose Displaysdebugoutput.

-f

--force ForcefullystopsallSolrprocesses.

Examples1. StoptheGPTextcluster.

gptext-stop

2. ForcestoptheGPTextcluster.

gptext-stop-f```

gptext-uninstallUninstallsGPText,includingdataandinstalledfiles.UninstallsZooKeepernodesiftheywereinstalledwiththeGPTextinstaller.

StopsanyrunningGPTextinstances.

DeletesallSolrdirectoriesinsegmentdirectories.

Deletestheinstallationdirectory.

RemovesallGPTextschemasandindexesfromalldatabases.

UninstallsZooKeeperifitwasinstalledwiththeGPTextinstaller.

Syntax

gptext-uninstall-h|--help

gptext-uninstall[-v|--verbose]

Parameters

-h

--help Displaysausagemessageandexits.

-r

--restart RestartstheGPTextcluster.

©CopyrightPivotalSoftware,Inc,2013-2017 100 2.1.2

Page 101: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

-v

--verbose Displaysdebugoutput.

NotesTouse gptext-uninstall ,youmusthavesuperuserpermissionsonalldatabaseswithGPTextschemas.

gptext-uninstall runsonlyifthereisatleastonedatabasewithaGPTextschema.

Examples1. UninstallGPText.

gptext-uninstall

gptext-upgradeUpgradesthecurrentGPTextsystemtoanewGPTextrelease.

Syntax

gptext-upgrade[-h|--help]

gptext-upgrade[-f<upgrade_file>|--file=<upgrade_file>][-c|--base_check][-v|--verbose]

Parameter Description

-h

--help Displaysausagemessageandexits.

-fupgrade_file

--fileupgrade_file

Providesthepathtotheupgradefile.Thedefaultupgradefileis $GPPERFMONHOME/share/upgrade.yaml .

-c

--base_check

Bydefault, gptext-upgrade checksthattheGPTextenvironmentcanbeupgradedandreportsanyitemsthatmustbecorrectedbeforeupgrading.Whenthe -c or --base-check optionissupplied,theenvironmentcheckisomitted.

-v

--verbose Displaysdebugoutputwhenexecutingthecommand.

NotesTheupgrade_fileisaYAML-formattedscriptdefiningactionstoupgradeaGPTextsystemfromapreviousreleasetothecurrentrelease.Thefileisnotintendedtobeeditedbyusers.Iftheupgrade_filedoesnotcontainsupportforthepreviousGPTextrelease, gptext-upgrade outputsanerrormessageandexits.

©CopyrightPivotalSoftware,Inc,2013-2017 101 2.1.2

Page 102: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

zkManagerCheckstheZooKeeperclusterstate.IfZooKeeperwasinstalledwithGPText, zkManager canstartorstoptheZooKeepercluster.

Syntax

zkManager[-h|--help]

zkManagerstate[-v|--verbose]

zkManagerstart[-v|--verbose]

zkManagerstop[-v|--verbose][-f|--force]

Parameters

-h

--help Displayausagemessageandquit.

-f

--force Whenusedwiththe stop command,performsaforcedstop.

-v

--verbose Displaysdebugoutputwhenexecutingthecommand.

NotesThe zkManager start and zkManager stop commandsareonlyavailableiftheZooKeeperclusterwasinstalledbytheGPTextinstaller.

Bydefault,all gptext-* utilitieschecktheZooKeeperclusterstate.Iftheclusterisnothealthy,theZooKeeperstateinformationisdisplayedtowarntheuser.

The nc (netcat)commandmustbeinstalledonthemasterhost.Run nc inaterminaltoensurethecommandisinstalled.

Examples1. StarttheZooKeepercluster,ifZooKeeperwasinstalledbytheGPTextbinary:

zkManagerstart

2. StoptheZooKeepercluster,ifZooKeeperwasinstalledbytheGPTextbinary:

zkManagerstop

3. ForcestoptheZooKeepercluster,ifZooKeeperwasinstalledbytheGPTextbinary:

zkManagerstop-f

4. CheckthestateoftheZooKeepercluster:

$zkManagerstate20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-Executezookeeperstateprocess.20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-HostportLatencymin/avg/maxMode20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-gpdb-sandbox.localdomain21880/0/17follower20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-gpdb-sandbox.localdomain21890/0/17leader20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-gpdb-sandbox.localdomain21900/0/70follower20160603:14:17:06:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-Done.

©CopyrightPivotalSoftware,Inc,2013-2017 102 2.1.2

Page 103: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

©CopyrightPivotalSoftware,Inc,2013-2017 103 2.1.2

Page 104: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

GPTextandSolrDataTypeMappingsThefollowingtablemapsGreenplumDatabasedatatypestoSolrdatatypes.

IfaGreenplumDatabasedatatypeisnotlisted,itisa text typeinSolr.

IfaGreenplumDatabasedatatypeisanarrayitismappedtoamulti-valuetypeinSolr.Forexample, INT[]

mapstoamulti-value int Solrfield.

GreenplumDatabaseType SolrType

bit string

bool boolean

bytea binary

char string

date tdate

float4 float

float8 double

int int

int2 int

int4 int

int8 long

interval string

money string

name string

numeric double

point point

text text

time string

timestamp tdate

timestamptz tdate

timetz string

uuid uuid

varbit string

varchar text

©CopyrightPivotalSoftware,Inc,2013-2017 104 2.1.2

Page 105: Table of Contentsgptext.docs.pivotal.io › archives › GPText-docs-212.pdf · Installing GPText also installs Apache Solr Cloud 6.1 and, optionally, Apache ZooKeeper. Following

GPTextConfigurationParametersGPTextconfigurationparameterscanbeoverriddenbysettinganewvalueinaGreenplumDatabasesession.ChangesmadetoconfigurationparametersonlyaffectfutureGPTextoperations;existingindexesusetheparametervaluesthatweresetwhentheywerecreated.

SeeChangingGPTextServerConfigurationParametersforinformationaboutchangingconfigurationparametersandexamples.

ThefollowingtableliststheGPTextconfigurationparameterswiththeirdefaultsandvalueconstraints.

admin_timeout Timeout,inseconds,foradminrequests(create_index,etc.). 30 INT_MAX 3600

commit_timeout Timeout,inseconds,forpreparecommitandcommitoperations. 30 INT_MAX 3600

delete_timeout Timeout,inseconds,fordeleterequests. 30 INT_MAX 3600

extension_factorMaximumnumberofreplicasthatcanbeaddedforanindexperGPTextnodeaftertheindexiscreated.

0 10 2

facet_timeout Timeout,inseconds,forfacetingqueries. 30 INT_MAX 3600

failover_factorMinimumratioofSolrnodesthatmustbeupinordertocreateanewindex.(SolrNodesUp/TotalSolrNodes )

0.0 1.0 0.8

hl_post_tag Markupthat gptext-highlight() insertsaftertermsinsearchresults. '</em>'

hl_pre_tag Markupthat gptext-highlight() insertsbeforetermsinsearchresults. '<em>'

idx_buffer_size Sizeofindexingbufferinbytes. 4096 67108864 134217728

idx_delim Delimitertouseduringindexing.

comma

','

idx_encapsulatorThecharacteroptionallyusedtosurroundvaluestopreservecharacterssuchastheCSVseparatororwhitespace.

quote

'"'

idx_escape Escapecharactertouseforindexing.

backslash

'\\'

index_timeout Timeout,inseconds,forreceivingresponsetoindexingoperation. 30 INT_MAX 3600

optimize_timeout Timeout,inseconds,foroptimizeoperations. 30 INT_MAX 3600

ping_timeout Timeout,inseconds,forpingrequests. 30 INT_MAX 120

replication_factor Thenumberofreplicaspershardforanewlycreatedindex. 0 10 2

replication_timeout Timeout,inseconds,forreplicationoperations(backup,restore). 30 INT_MAX 43200

rollback_timeout Timeout,inseconds,forrollbackoperations. 30 INT_MAX 3600

search_batch_size Batchsizeforsearchrequests. 1 INT_MAX 2500000

search_buffer_size Buffersizeforsearchresults,inbytes. 4096 67108864 16777216

search_param_separator Delimitertouseinthe options parameterofthe gptext.search() UDF. '&'

search_post_buffer_size Postbuffersizeforsearch,inbytes. 512 4194304 4096

search_timeout Timeout,inseconds,forsearches. 30 INT_MAX 600

stats_timeout Timeout,inseconds,forobtainingstatistics. 30 INT_MAX 600

terms_batch_size Batchsizefortermsoperations. 1 INT_MAX 1000

©CopyrightPivotalSoftware,Inc,2013-2017 105 2.1.2