table of contentsgptext.docs.pivotal.io › archives › gptext-docs-212.pdf · installing gptext...
TRANSCRIPT
12347101317192024313745505355596087104105
TableofContents
TableofContentsPivotalGPText2.1.2DocumentationReleaseNotesPivotal®GPText2.1.2ReleaseNotesPivotal®GPText2.1.1ReleaseNotesPivotal®GPText2.1ReleaseNotesInstallingGPTextUpgradingGPTextUsingPivotalGPTextIntroductiontoPivotalGPTextWorkingWithGPTextIndexesQueryingGPTextIndexesCustomizingGPTextIndexesAdministeringGPTextGPTextHighAvailabilityGPTextBestPracticesGlossaryGPTextReferencesGPTextFunctionReferenceGPTextManagementUtilitiesGPTextandSolrDataTypeMappingsGPTextConfigurationParameters
©CopyrightPivotalSoftware,Inc,2013-2017 1 2.1.2
PivotalGPText2.1.2Documentation
PivotalGPTextDocumentationPDF
PivotalGPText2.1.2ReleaseNotes
InstallingPivotalGPText
UpgradingPivotalGPText
UsingPivotalGPText
GPTextReferences
AdditionalResourcesPivotalGreenplumDatabase
ApacheSolrWebSite
ApacheMADlib
©CopyrightPivotalSoftware,Inc,2013-2017 2 2.1.2
ReleaseNotesRelease-specificinformationincludingnew,changed,anddeprecatedfeatures,resolvedandknownissues,migrationinformation,andsupportedhardwareandsoftware.
PivotalGPText2.1.1ReleaseNotes
PivotalGPText2.1.0ReleaseNotes
PivotalGPText2.0.0ReleaseNotes
©CopyrightPivotalSoftware,Inc,2013-2017 3 2.1.2
Pivotal®GPText2.1.2ReleaseNotesThisdocumentcontainsreleaseinformationforPivotalGPText2.1.2.
Published:August,2017
AboutPivotalGPTextPivotalGPTextjoinstheGreenplumDatabasemassivelyparallel-processingdatabaseserverwithApacheSolrCloudenterprisesearchandtheApacheMADlibAnalyticsLibrarytoprovidelarge-scaleanalyticsprocessingandbusinessdecisionsupport.GPTextincludesfreetextsearchaswellassupportfortextanalysis.
GPTextincludesthefollowingfeatures:
TheGPTextdatabaseschemaprovidesin-databaseaccesstoApacheSolrindexingandsearching
Customtokenizersforinternationaltextandsocialmediatext
AUniversalQueryProcessorthatacceptsquerieswithmixedsyntaxfromsupportedSolrqueryprocessors
Facetedsearchresults
Termhighlightinginresults
Greateremphasisonhighavailability
TheGPTextmanagementutilitysuiteincludescommand-lineutilitiestoperformthefollowingtasks:
Start,stop,andmonitorZooKeeperandGPTextnodes
ConfigureGPTextnodesandindexes
Addanddeletereplicasforindexshards
BackupandrestoreGPTextindexes
RecoveraGPTextnode
ExpandtheGPTextclusterbyaddingGPTextnodes
PrerequisitesInstallingGPTextalsoinstallsApacheSolrCloud6.1and,optionally,ApacheZooKeeper.
FollowingareGPTextinstallationprerequisites.
InstallandconfigureyourGreenplumDatabasesystem,version4.3.6orhigher.SeethePivotalGreenplumDatabaseInstallationGuideathttps://gpdb.docs.pivotal.io .
GPTextrunsonRedHatEnterpriseLinux5.xor6.x.
GPTextcannotbeinstalledontoasharedNFSmount.
InstallOracleJRE1.8.xandaddits bin directorytothe PATH onallhostsinthecluster.
Ensurethat nc (netcat)isinstalledonallGreenplumclusterhosts( sudo yum install nc ).
Installing lsof onallclusterhostsisrecommended( sudo yum install lsof ).
GPTextnodescanbeinstalledontheGreenplumDatabaseclusterhostsalongsidetheGreenplumsegmentsoronadditional,non-databasehostsaccessibleontheGreenplumclusternetwork.AllhostsparticipatingintheGPTextsystemmusthavethesameoperatingsystemandconfigurationandhavepasswordless-sshaccessforthegpadminuser.SeethePivotalGreenplumDatabaseInstallationGuideforinstructionstoconfigurehosts.
IfyouplantoplaceGPTextnodesontheGreenplumDatabasesegmenthosts,ensurethatyoureservememoryforGPTextusewhenyouconfigureGreenplumDatabase.TodeterminethememorytosetasideforGPText,multiplythenumberofGPTextnodestocreateoneachGreenplumsegmenthostbytheJVMmaximumsize.SubtractthismemoryfromthephysicalRAMwhencalculatingthevaluefortheGreenplumDatabasegp_vmem_protect_limit serverconfigurationparameter.SeetheGreenplumDatabaseserverconfigurationparameter gp_vmem_protect_limit intheGreenplumDatabaseReferenceGuideforrecommendedmemorycalculationformulasorvisittheGPDBVirtualMemoryCalculator website.
ApacheSolrrequiresaZooKeeperclusterwithatminimumthreenodes(fivenodesrecommended).Youcaninstalla“binding”ZooKeeperclusterwith
GPTextrequiresOracleJDK1.8.x.YoucannotuseanOpenJDKJREwithGPText.
©CopyrightPivotalSoftware,Inc,2013-2017 4 2.1.2
GPTextontheGreenplumclusterhosts,oryoucanuseanexistingZooKeepercluster.WhendeployedalongsideGreenplumDatabasesegments,ZooKeeperperformancecanbeaffectedunderheavydatabaseload.Forbestperformance,installaZooKeeperclusteronseparatehostswithnetworkconnectivitytotheGreenplumnetwork.
NewFeatures,Enhancements,andFixesinGPText2.1.2
FeaturesandEnhancementsAnoptionalSolroptionsparameterisaddedtothefollowingGPTextfunctions:
gptext.search_count()
gptext.faceted_range_search()
gptext.faceted_query_search()
gptext.faceted_field_search()
The options parameterisanampersand-delimitedlistofSolroptionstoincludeinthesearch.SeeSolroptionsformoreaboutusingSolroptions.
AddedsupportfortheGreenplumDatabase5.0 uuid datatype.GreenplumDatabase uuid columnsaremappedtothe solr.UUIDField typeinGPTextindexes.Columnsof uuid typemayalsobespecifiedfortheuniqueidcolumn( id_col )inthe gptext.create_index() function.
Changedthedefaultvalueofthe term_batch_size GPTextconfigurationparameterfrom50000to1000.Thisreducesthepossibilityofanoutofmemory(OOM)errorwhenexecutingthe gptext.terms() function.IfyouexperienceOOMerrorswiththisnewdefaultvalue,youmayneedtofurtherreducethevalueof term_batch_size .FormoreinformationseeTermsQueriesandOutofMemoryErrors.
Inpreviousreleases,auserwaspermittedaccesstoaGPTextindexiftheirrolehadpermissiontoaccesstheGreenplumDatabasetablefromwhichtheindexwascreated.Permissionsarenolongercheckedonthebasetable.ThismakesitpossibletodropthebasetableandcontinuetosearchtheGPTextindex.GPTextfunctionsthatdependupontheexistenceofthebasetable,suchas gptext.add_field() ,arenotallowedafterthedatabasetablehasbeendropped.Ifthetablewaspartitioned,GPTextqueriesmustspecifytheroottablename.
FixesWhenthe defType=dismax SolroptionwasaddedtotheSolroptionsparameterofthe gptext.search() function,GPTextlowercasedtheoptionnameto deftype beforesubmittingthequerytoSolr,causingthequerytoreturnunexpectedresults.Thishasbeenfixed.
KnownIssuesFollowingareknownissuesinGPText.Workaroundsareprovidedwhenavailable.
WildcardsinGPTextSearchOptionsSolrdoesnotreturnallfieldswhenthe fl Solrsearchoptioncontainsawildcardthatmatchesfieldnames.Forexample,givenatablewithcolumnscontenta and contentb ,specifying fl=contenta,contentb,(sum,1,1) correctlyreturnsthreefields.Specifying fl=cont*,sum(1,1) correctlyreturns contenta andcontentb ,butomitsthepseudo-field sum(1,1) .
Specifyingawildcardtomatchallfields( fl=*,sum(1,1) )alsoomitsthepseudo-field.
IndexLoadFailureAfterConfigurationFileErrorIfSolrfailstoloadanindexbecauseofaconfigurationfileerror,andthentheindexisdroppedwithoutfirstcorrectingtheconfigurationfileerror,theindexcannotberecreateduntilGPTextisrestarted.Thiscanhappenifyouedit managed-schema or solrconfig.xml andintroduceanXMLsyntaxerrororatypoinconfigurationvalues.
Workaround:
1. Whenanindexfailstoload,checktheSolrlogtofindthecause.
2. Ifthecauseisaconfigurationfileerror,suchasinvalidXML,usethe gptext-config utilitytoeditthefileandfixtheerror.Droppingtheindexwithoutfirstcorrectingtheerrorisnotrecommended.
©CopyrightPivotalSoftware,Inc,2013-2017 5 2.1.2
3. Ifyouhavedroppedanindexthatfailedtoloadwithoutfirstcorrectingthecauseofthefailure,youmustrestartGPTextbeforeyoucanrecreatetheindex.Run gptext-start -r torestartGPText.
StartupFailurewithLargeNumbersofIndexesWhenthereisalargenumberofSolrcores,SolrCloudcanfailtorestartsuccessfully,witherrormessagesindicatingfailuretoelectleadersforshards.ThisisaknownSolrissue;seehttps://issues.apache.org/jira/browse/SOLR-5990 intheApacheSolrJiraforanexample.Becauseofthisissue,itisrecommendedtoavoiddesigningGPTextapplicationsthatcreatelargenumbersofindexes,shards,andreplicas.Thenumberofcoresyoucancreatebeforeyouobservethisbehaviorishardwaredependent,soyoushouldtesttodetermineyoursystem’slimits.Youcancreateandsuccessfullyoperatealargernumbersofindexesthancanberestartedsuccessfullylater,sobesuretotestrestartingGPTexttodetermineapracticallimit.
SettingGPTextConfigurationParametersWithoutFirstSettingcustom_variable_classesIfthe custom_variable_classes GreenplumDatabaseserverconfigurationparameterdoesnotincludethevalue“gptext”,attemptingtosetaGPTextconfigurationparameterreturnsanerrormessage,forexample:
mydb-#setgptext.replication_factor=4;WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)ERROR:unrecognizedconfigurationparameter"gptext.replication_factor"
InGPText2.0,inadditiontotheerrormessage,thevalueoftheconfigurationparameterpersistedinZooKeeperiszero,replacingthepreviousvalueoftheparameter.
mydb-#showgptext.replication_factor;gptext.replication_factor----------------------------0
BeginningwithGPText2.1,theerrormessageisstillgenerated,howeverthevaluesavedinZooKeeperisthevaluespecifiedinthe set command,4intheprecedingexample.
Topreventtheerrormessage,beforesettinganyGPTextconfigurationparameters,usethe gpconfig command-lineutilitytosetthe custom_variable_classes
configurationparameter:
$gpconfig-ccustom_variable_classes-v'gptext'
CannotSourcegreenplum_path.shaftergreenplum-text_path.shTheGPText greenplum-text_path.sh scriptrequiresthattheGreenplumDatabase greenplum_path.sh scriptberunfirst.Forexample:
$source/usr/local/greenplum-db/greenplum_path.sh$source/usr/local/greenplum-text/greenplum-text_path.sh
TheGPTextscriptmodifiesthe PATH and PYTHONPATH environmentvariablessetpreviouslybytheGreenplumDatabasescript.
Ifyousource greenplum_path.sh againafteryouhaverun greenplum-text_path.sh ,GPText’s PYTHONPATH isoverwrittenandGPTextfails.
Aworkaroundistosourcethe greenplum-text_path.sh scriptfromthe greenplum_path.sh script.Editthefile $GPHOME/greenplum_path.sh andaddthefollowinglinetotheendofthefile:
source/usr/local/greenplum-text/greenplum-text_path.sh``
©CopyrightPivotalSoftware,Inc,2013-2017 6 2.1.2
Pivotal®GPText2.1.1ReleaseNotesThisdocumentcontainsreleaseinformationforPivotalGPText2.1.1.
Published:July2017
AboutPivotalGPTextPivotalGPTextjoinstheGreenplumDatabasemassivelyparallel-processingdatabaseserverwithApacheSolrCloudenterprisesearchandtheApacheMADlibAnalyticsLibrarytoprovidelarge-scaleanalyticsprocessingandbusinessdecisionsupport.GPTextincludesfreetextsearchaswellassupportfortextanalysis.
GPTextincludesthefollowingfeatures:
TheGPTextdatabaseschemaprovidesin-databaseaccesstoApacheSolrindexingandsearching
Customtokenizersforinternationaltextandsocialmediatext
AUniversalQueryProcessorthatacceptsquerieswithmixedsyntaxfromsupportedSolrqueryprocessors
Facetedsearchresults
Termhighlightinginresults
Greateremphasisonhighavailability
TheGPTextmanagementutilitysuiteincludescommand-lineutilitiestoperformthefollowingtasks:
Start,stop,andmonitorZooKeeperandGPTextnodes
ConfigureGPTextnodesandindexes
Addanddeletereplicasforindexshards
BackupandrestoreGPTextindexes
RecoveraGPTextnode
ExpandtheGPTextclusterbyaddingGPTextnodes
PrerequisitesInstallingGPTextalsoinstallsApacheSolrCloudand,optionally,ApacheZooKeeper.
FollowingareGPTextinstallationprerequisites.
InstallandconfigureyourGreenplumDatabasesystem,version4.3.6orhigher.SeethePivotalGreenplumDatabaseInstallationGuideathttps://gpdb.docs.pivotal.io .
GPTextrunsonRedHatEnterpriseLinux5.xor6.x.
GPTextcannotbeinstalledontoasharedNFSmount.
InstallOracleJRE1.8.xandaddits bin directorytothe PATH onallhostsinthecluster.
Ensurethat nc (netcat)isinstalledonallGreenplumclusterhosts( sudo yum install nc ).
Installing lsof onallclusterhostsisrecommended( sudo yum install lsof ).
GPTextnodescanbeinstalledontheGreenplumDatabaseclusterhostsalongsidetheGreenplumsegmentsoronadditional,non-databasehostsaccessibleontheGreenplumclusternetwork.AllhostsparticipatingintheGPTextsystemmusthavethesameoperatingsystemandconfigurationandhavepasswordless-sshaccessforthegpadminuser.SeethePivotalGreenplumDatabaseInstallationGuideforinstructionstoconfigurehosts.
IfyouplantoplaceGPTextnodesontheGreenplumDatabasesegmenthosts,ensurethatyoureservememoryforGPTextusewhenyouconfigureGreenplumDatabase.TodeterminethememorytosetasideforGPText,multiplythenumberofGPTextnodestocreateoneachGreenplumsegmenthostbytheJVMmaximumsize.SubtractthismemoryfromthephysicalRAMwhencalculatingthevaluefortheGreenplumDatabasegp_vmem_protect_limit serverconfigurationparameter.SeetheGreenplumDatabaseserverconfigurationparameter gp_vmem_protect_limit intheGreenplumDatabaseReferenceGuideforrecommendedmemorycalculationformulasorvisittheGPDBVirtualMemoryCalculator website.
ApacheSolrrequiresaZooKeeperclusterwithatminimumthreenodes(fivenodesrecommended).Youcaninstalla“binding”ZooKeeperclusterwith
GPTextrequiresOracleJDK1.8.x.YoucannotuseanOpenJDKJREwithGPText.
©CopyrightPivotalSoftware,Inc,2013-2017 7 2.1.2
GPTextontheGreenplumclusterhosts,oryoucanuseanexistingZooKeepercluster.WhendeployedalongsideGreenplumDatabasesegments,ZooKeeperperformancecanbeaffectedunderheavydatabaseload.Forbestperformance,installaZooKeeperclusteronseparatehostswithnetworkconnectivitytotheGreenplumnetwork.
NewFeatures,Enhancements,andFixesinGPText2.1.1
FeaturesandEnhancementsAnewGPTextinstallationparameter, GPTEXT_JAVA_HOME ,hasbeenaddedtothe gptext_install_config installationconfigurationfiletospecifythehomedirectoryoftheJavainstallationtouseforrunningZooKeeperandSolrprocesses.Ifthisparameterisnotset,theJREisdeterminedfromthePATH and JAVA_HOME environmentvariables.IfyouhaveJavaapplicationsthatrequireaversionofJavanotsupportedwithGPText,youcansetthatversionintheenvironmentvariablesandsetthe GPTEXT_JAVA_HOME parametertooverrideit.
GPText2.1.1canbeusedwithGreenplumDatabase5.0Beta.AGPTextsystemdeployedonGreenplumDatabase4.3.xcanbemigratedtoGreenplumDatabase5.0.
GPTextnowcansupportuniqueidcolumnswithdatatypes int4 , int8 , varchar ,and text . gptext.search() resultsreturntheidcolumnastext values.Ifthe id_col isan int4 or int8 type,itmaybenecessarytocasttheidcolumninreturnedresultstothecorrectnumerictype.Forexample,ifyoujoinsearchresultswiththeoriginaltableororderresultsbynumericid,youwillneedtocastthereturnedidcolumntothecorrecttype.
GPTexthasbeenoptimizedtolookupconfigurationparameters(GUCs)inZooKeeperontheGreenplummasterandthendistributetheparameterstothesegments,eliminatingZooKeeperlookupsfromeveryGPTextinstance.
New timestamp and ttimestamp fieldtypesareaddedtotheSolr managed-schema toallowGPTexttostoretimestampswiththesameprecisionasGreenplumDatabase.Ifyourequiremicro-secondprecisionforfieldsstoredinaGPTextindexyoucaneditthe managed-schema andchangethetype attributeofthe <field> elementfrom date or tdate to timestamp or ttimestamp .
Fixes[MPP-28960]GPTextinstallationfailedwhensshwasconfiguredtorequirefully-qualifieddomainname(FQDN).Thishasbeenfixed.
KnownIssuesFollowingareknownissuesinGPText.Workaroundsareprovidedwhenavailable.
WildcardsinGPTextSearchOptionsSolrdoesnotreturnallfieldswhenthe fl Solrsearchoptioncontainsawildcardthatmatchesfieldnames.Forexample,givenatablewithcolumnscontenta and contentb ,specifying fl=contenta,contentb,(sum,1,1) correctlyreturnsthreefields.Specifying fl=cont*,sum(1,1) correctlyreturns contenta andcontentb ,butomitsthepseudo-field sum(1,1) .
Specifyingawildcardtomatchallfields( fl=*,sum(1,1) )alsoomitsthepseudo-field.
IndexLoadFailureAfterConfigurationFileErrorIfSolrfailstoloadanindexbecauseofaconfigurationfileerror,andthentheindexisdroppedwithoutfirstcorrectingtheconfigurationfileerror,theindexcannotberecreateduntilGPTextisrestarted.Thiscanhappenifyouedit managed-schema or solrconfig.xml andintroduceanXMLsyntaxerrororatypoinconfigurationvalues.
Workaround:
1. Whenanindexfailstoload,checktheSolrlogtofindthecause.
2. Ifthecauseisaconfigurationfileerror,suchasinvalidXML,usethe gptext-config utilitytoeditthefileandfixtheerror.Droppingtheindexwithoutfirstcorrectingtheerrorisnotrecommended.
3. Ifyouhavedroppedanindexthatfailedtoloadwithoutfirstcorrectingthecauseofthefailure,youmustrestartGPTextbeforeyoucanrecreatetheindex.Run gptext-start -r torestartGPText.
©CopyrightPivotalSoftware,Inc,2013-2017 8 2.1.2
StartupFailurewithLargeNumbersofIndexesWhenthereisalargenumberofSolrcores,SolrCloudcanfailtorestartsuccessfully,witherrormessagesindicatingfailuretoelectleadersforshards.ThisisaknownSolrissue;seehttps://issues.apache.org/jira/browse/SOLR-5990 intheApacheSolrJiraforanexample.Becauseofthisissue,itisrecommendedtoavoiddesigningGPTextapplicationsthatcreatelargenumbersofindexes,shards,andreplicas.Thenumberofcoresyoucancreatebeforeyouobservethisbehaviorishardwaredependent,soyoushouldtesttodetermineyoursystem’slimits.Youcancreateandsuccessfullyoperatealargernumbersofindexesthancanberestartedsuccessfullylater,sobesuretotestrestartingGPTexttodetermineapracticallimit.
SettingGPTextConfigurationParametersWithoutFirstSettingcustom_variable_classesIfthe custom_variable_classes GreenplumDatabaseserverconfigurationparameterdoesnotincludethevalue“gptext”,attemptingtosetaGPTextconfigurationparameterreturnsanerrormessage,forexample:
mydb-#setgptext.replication_factor=4;WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)ERROR:unrecognizedconfigurationparameter"gptext.replication_factor"
InGPText2.0,inadditiontotheerrormessage,thevalueoftheconfigurationparameterpersistedinZooKeeperiszero,replacingthepreviousvalueoftheparameter.
mydb-#showgptext.replication_factor;gptext.replication_factor----------------------------0
BeginningwithGPText2.1,theerrormessageisstillgenerated,howeverthevaluesavedinZooKeeperisthevaluespecifiedinthe set command,4intheprecedingexample.
Topreventtheerrormessage,beforesettinganyGPTextconfigurationparameters,usethe gpconfig command-lineutilitytosetthe custom_variable_classes
configurationparameter:
$gpconfig-ccustom_variable_classes-v'gptext'
CannotSourcegreenplum_path.shaftergreenplum-text_path.shTheGPText greenplum-text_path.sh scriptrequiresthattheGreenplumDatabase greenplum_path.sh scriptberunfirst.Forexample:
$source/usr/local/greenplum-db/greenplum_path.sh$source/usr/local/greenplum-text/greenplum-text_path.sh
TheGPTextscriptmodifiesthe PATH and PYTHONPATH environmentvariablessetpreviouslybytheGreenplumDatabasescript.
Ifyousource greenplum_path.sh againafteryouhaverun greenplum-text_path.sh ,GPText’s PYTHONPATH isoverwrittenandGPTextfails.
Aworkaroundistosourcethe greenplum-text_path.sh scriptfromthe greenplum_path.sh script.Editthefile $GPHOME/greenplum_path.sh andaddthefollowinglinetotheendofthefile:
source/usr/local/greenplum-text/greenplum-text_path.sh``
©CopyrightPivotalSoftware,Inc,2013-2017 9 2.1.2
Pivotal®GPText2.1ReleaseNotesThisdocumentcontainsreleaseinformationforPivotalGPText2.1.
Published:June2017
AboutPivotalGPTextPivotalGPTextjoinstheGreenplumDatabasemassivelyparallel-processingdatabaseserverwithApacheSolrCloudenterprisesearchandtheApacheMADlibAnalyticsLibrarytoprovidelarge-scaleanalyticsprocessingandbusinessdecisionsupport.GPTextincludesfreetextsearchaswellassupportfortextanalysis.
GPTextincludesthefollowingfeatures:
TheGPTextdatabaseschemaprovidesin-databaseaccesstoApacheSolrindexingandsearching
Customtokenizersforinternationaltextandsocialmediatext
AUniversalQueryProcessorthatacceptsquerieswithmixedsyntaxfromsupportedSolrqueryprocessors
Facetedsearchresults
Termhighlightinginresults
Greateremphasisonhighavailability
TheGPTextmanagementutilitysuiteincludescommand-lineutilitiestoperformthefollowingtasks:
Start,stop,andmonitorZooKeeperandGPTextnodes
ConfigureGPTextnodesandindexes
Addanddeletereplicasforindexshards
BackupandrestoreGPTextindexes
RecoveraGPTextnode
ExpandtheGPTextclusterbyaddingGPTextnodes
PrerequisitesInstallingGPTextalsoinstallsApacheSolrCloudand,optionally,ApacheZooKeeper.
FollowingareGPTextinstallationprerequisites.
InstallandconfigureyourGreenplumDatabasesystem,version4.3.6orhigher.SeethePivotalGreenplumDatabaseInstallationGuideathttps://gpdb.docs.pivotal.io .
GPTextrunsonRedHatEnterpriseLinux5.xor6.x.
GPTextcannotbeinstalledontoasharedNFSmount.
InstallOracle JRE 1.8.x andaddits bin directorytothe PATH onallhostsinthecluster.
Ensurethat nc (netcat)isinstalledonallGreenplumclusterhosts( sudo yum install nc ).
Installing lsof onallclusterhostsisrecommended( sudo yum install lsof ).
GPTextnodescanbeinstalledontheGreenplumDatabaseclusterhostsalongsidetheGreenplumsegmentsoronadditional,non-databasehostsaccessibleontheGreenplumclusternetwork.AllhostsparticipatingintheGPTextsystemmusthavethesameoperatingsystemandconfigurationandhavepasswordless-sshaccessforthegpadminuser.SeethePivotalGreenplumDatabaseInstallationGuideforinstructionstoconfigurehosts.
IfyouplantoplaceGPTextnodesontheGreenplumDatabasesegmenthosts,ensurethatyoureservememoryforGPTextusewhenyouconfigureGreenplumDatabase.TodeterminethememorytosetasideforGPText,multiplythenumberofGPTextnodestocreateoneachGreenplumsegmenthostbytheJVMmaximumsize.SubtractthismemoryfromthephysicalRAMwhencalculatingthevaluefortheGreenplumDatabasegp_vmem_protect_limit serverconfigurationparameter.SeetheGreenplumDatabaseserverconfigurationparameter gp_vmem_protect_limit intheGreenplumDatabaseReferenceGuideforrecommendedmemorycalculationformulasorvisittheGPDBVirtualMemoryCalculator website.
ApacheSolrrequiresaZooKeeperclusterwithatminimumthreenodes(fivenodesrecommended).Youcaninstalla“binding”ZooKeeperclusterwithGPTextontheGreenplumclusterhosts,oryoucanuseanexistingZooKeepercluster.WhendeployedalongsideGreenplumDatabasesegments,ZooKeeperperformancecanbeaffectedunderheavydatabaseload.Forbestperformance,installaZooKeeperclusteronseparatehostswithnetworkconnectivitytotheGreenplumnetwork.
©CopyrightPivotalSoftware,Inc,2013-2017 10 2.1.2
NewFeaturesandEnhancementsinGPText2.1Previously,allZooKeepernodesrunningonthesamehostwrotemessagestothesamelogfile.Noweachnodewritestoalogfileinitsowndatadirectory.
GPTextwasunabletoparse“ - ”and“ _ ”charactersinthe ZOO_HOSTS configurationparameterinthe gptext_install_config file.Thisisfixed.
The gptext-state managementutilityhasanew -D ( --details )option.Thedefaultformatfortheutility’soutputischangedtodisplaycountsofindexesintheGreen,Yellow,andRedstates.Withthe -D option,thecommandincludestheGreen,Yellow,orRedstatusofeachGPTextindex.Theutilityreturnsanon-zerostatusifanyindexesarenotintheGreenstate.
OptimizationsandchangeshavebeenmadetomakestartingupaGPTextclusterwithalargenumberofindexesmorestable:
Thedefaultvalueofthe replication_factor configurationparameterischangedfrom3to2.The gptext-start utilityhasanew --slow-start ( -s )optionthatperformsa“rolling-start,”startinguponeindexatatime.ThisoptionshouldbeusedwhenstartingaGPTextclusterthathasalargenumberofindexes.Ifyouhavemorethan50indexesandomitthisoption,theutilitypromptsyoutoconfirmthatyouwanttoperformafaststartup.OneGPTextinstanceoneachhostintheclusterisassignedtheoverseerroletohelpreducedemandonZooKeeper.The coreLoadThreads inthe solr.xml configurationfileisincreasedto512.The solr.jetty.threads.max parameterinthe jetty.xml fileforeachSolrbinaryisincreasedto30000.
AddedsupportforindexingGreenplumDatabasepartitionedtables.
ABoolean partitioned columnisaddedtotheresultsofthe gptext.index_status() function.The gptext-state -i <index_name> commandincludesalistofchildpartitionindexesforpartitionedtablesinalldatabases.Anew gptext.partition_status() functionlistschildpartitionindexesinthecurrentdatabase.
GPTextnodesarenolongerrequiredtorunonGreenplumDatabasesegmenthosts.GPTextnodescanbedeployedonseparatehostswithconnectivitytotheGreenplumnetwork,andpasswordless-sshaccessforthegpadminuserbetweenallGreenplumDatabaseandGPTextnodes.TheGPTextinstallerandthe gptext-uninstall , gptext-expand ,and gptext-recover managementutilitiesareupdatedtoaccommodatethischange.
Addedautility, gptext_migrator ,tomigrateaGPTextsystemintoaGreenplumDatabaseclusterthathasbeenupgradedtoanewGPDBrelease.MigratingdoesnotchangetheGPTextversion;itmovesthecurrentGPTextsystemtoadifferentGreenplumDatabasecluster.MigratingaGPTextsystemfromGPDB4.3.xtoGPDB5.xisnotcurrentlysupported.
TheGPText2.1installercanupgradeaGPText2.0systemtothenewGPTextrelease.Ifrequested,theinstallerrunsanewutility, gptext-upgrade ,toupgradetheGPTextsystem.The gptext-upgrade utilitycanalsoberunseparatelyfromtheinstaller.
KnownIssuesFollowingareknownissuesinGPText.Workaroundsareprovidedwhenavailable.
WildcardsinGPTextSearchOptionsSolrdoesnotreturnallfieldswhenthe fl Solrsearchoptioncontainsawildcardthatmatchesfieldnames.Forexample,givenatablewithcolumnscontenta and contentb ,specifying fl=contenta,contentb,(sum,1,1) correctlyreturnsthreefields.Specifying fl=cont*,sum(1,1) correctlyreturns contenta andcontentb ,butomitsthepseudo-field sum(1,1) .
Specifyingawildcardtomatchallfields( fl=*,sum(1,1) )alsoomitsthepseudo-field.
IndexLoadFailureAfterConfigurationFileErrorIfSolrfailstoloadanindexbecauseofaconfigurationfileerror,andthentheindexisdroppedwithoutfirstcorrectingtheconfigurationfileerror,theindexcannotberecreateduntilGPTextisrestarted.Thiscanhappenifyouedit managed-schema or solrconfig.xml andintroduceanXMLsyntaxerrororatypoinconfigurationvalues.
Workaround:
1. Whenanindexfailstoload,checktheSolrlogtofindthecause.
2. Ifthecauseisaconfigurationfileerror,suchasinvalidXML,usethe gptext-config utilitytoeditthefileandfixtheerror.Droppingtheindexwithoutfirstcorrectingtheerrorisnotrecommended.
3. Ifyouhavedroppedanindexthatfailedtoloadwithoutfirstcorrectingthecauseofthefailure,youmustrestartGPTextbeforeyoucanrecreatetheindex.Run gptext-start -r torestartGPText.
©CopyrightPivotalSoftware,Inc,2013-2017 11 2.1.2
StartupFailurewithLargeNumbersofIndexesWhenthereisalargenumberofSolrcores,SolrCloudcanfailtorestartsuccessfully,witherrormessagesindicatingfailuretoelectleadersforshards.ThisisaknownSolrissue;seehttps://issues.apache.org/jira/browse/SOLR-5990 intheApacheSolrJiraforanexample.Becauseofthisissue,itisrecommendedtoavoiddesigningGPTextapplicationsthatcreatelargenumbersofindexes,shards,andreplicas.Thenumberofcoresyoucancreatebeforeyouobservethisbehaviorishardwaredependent,soyoushouldtesttodetermineyoursystem’slimits.Youcancreateandsuccessfullyoperatealargernumbersofindexesthancanberestartedsuccessfullylater,sobesuretotestrestartingGPTexttodetermineapracticallimit.
SettingGPTextConfigurationParametersWithoutFirstSettingcustom_variable_classesIfthe custom_variable_classes GreenplumDatabaseserverconfigurationparameterdoesnotincludethevalue“gptext”,attemptingtosetaGPTextconfigurationparameterreturnsanerrormessage,forexample:
mydb-#setgptext.replication_factor=4;WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)ERROR:unrecognizedconfigurationparameter"gptext.replication_factor"
InGPText2.0,inadditiontotheerrormessage,thevalueoftheconfigurationparameterpersistedinZooKeeperiszero,replacingthepreviousvalueoftheparameter.
mydb-#showgptext.replication_factor;gptext.replication_factor----------------------------0
BeginningwithGPText2.1,theerrormessageisstillgenerated,howeverthevaluesavedinZooKeeperisthevaluespecifiedinthe set command,4intheprecedingexample.
Topreventtheerrormessage,beforesettinganyGPTextconfigurationparameters,usethe gpconfig command-lineutilitytosetthe custom_variable_classes
configurationparameter:
$gpconfig-ccustom_variable_classes-v'gptext'
©CopyrightPivotalSoftware,Inc,2013-2017 12 2.1.2
InstallingGPText
PrerequisitesTheGPTextinstallationincludestheinstallationofApacheSolrCloudand,optionally,ApacheZooKeeper.
IfyouareinstallinganewGPTextreleaseintoanexistingGPTextsystem,followtheinstructionsinUpgradingGPTextinstead.
FollowingareGPTextinstallationprerequisites.
InstallandconfigureyourGreenplumDatabasesystem,version4.3.6orhigher.SeethePivotalGreenplumDatabaseInstallationGuideathttps://gpdb.docs.pivotal.io .
GPTextrunsonRedHatEnterpriseLinux5.xor6.x.
GPTextcannotbeinstalledontoasharedNFSmount.
InstallOracle JRE 1.8.x onallhostsinthecluster.
Ensurethat nc (netcat)isinstalledonallGreenplumclusterhosts( yum install nc ).
Installing lsof onallclusterhostsisrecommended( sudo yum install lsof ).
GPTextnodescanbeinstalledontheGreenplumDatabaseclusterhostsalongsidetheGreenplumsegmentsoronadditional,non-databasehostsaccessibleontheGreenplumclusternetwork.AllhostsparticipatingintheGPTextsystemmusthavethesameoperatingsystemandconfigurationandhavepasswordless-sshaccessforthegpadminuser.SeethePivotalGreenplumDatabaseInstallationGuideforinstructionstoconfigurehosts.
IfyouplantoplaceGPTextnodesontheGreenplumDatabasesegmenthosts,ensurethatyoureservememoryforGPTextusewhenyouconfigureGreenplumDatabase.TodeterminethememorytosetasideforGPText,multiplythenumberofGPTextnodestocreateoneachGreenplumsegmenthostbytheJVMmaximumsize.SubtractthismemoryfromthephysicalRAMwhencalculatingthevaluefortheGreenplumDatabasegp_vmem_protect_limit serverconfigurationparameter.SeetheGreenplumDatabaseserverconfigurationparameter gp_vmem_protect_limit intheGreenplumDatabaseReferenceGuideforrecommendedmemorycalculationformulasorvisittheGPDBVirtualMemoryCalculator website.
ApacheSolrrequiresaZooKeeperclusterwithatminimumthreenodes.Youcaninstalla“binding”ZooKeeperclusterwithGPTextontheGreenplumclusterhosts,oryoucanuseanexistingZooKeepercluster.WhendeployedalongsideGreenplumDatabasesegments,ZooKeeperperformancecanbeaffectedunderheavydatabaseload.Forbestperformance,installaZooKeeperclusterwithatleastthreenodes(fivenodesrecommended)onseparatehostswithnetworkconnectivitytotheGreenplumnetwork.
InstalltheGPTextBinaries1. OntheGreenplummasterhost,extracttheGPTextdistributionfile,acompressedtararchive.Forexample:
cd/home/gpadmintarxvfzgreenplum-text-release-rhel5_x86_64.tar.gz
Thereleasedirectorycontainsaninstallationconfigurationfile, gptext_install_config ,andtheGPTextinstallationbinary,whichhasanamesimilartogreenplum-text-<version>-<platform>.bin ,forexample, greenplum-text-2.1.0-rhel5_x86_64.bin .
2. Ifnecessary,grantexecutepermissiontotheGPTextbinary.Forexample:
chmod+x/home/gpadmin/greenplum-text-2.1.0-rhel5_x86_64.bin
3. IfyouareinstallingGPTextinadirectorythatisonlyaccessibletoroot,forexample /usr/local ,performthesesteps:
a. Createtheinstallationdirectoryasrootandchangetheownershiptothegpadminuser.b. Toinstalltoadirectorywheretheusermayormaynothavewritepermissions:
Usegpsshtocreateadirectorywiththesamefilepathonallhosts( mdw , smdw ,andthesegmenthosts sdw1 , sdw2 ,andsoon).Forexample:
/usr/local/<gptext-version>
Asroot,setthefilepermissionsandowner.Forexample:
GPTextrequiresOracleJDK1.8.x.YoucannotuseanOpenJDKJREwithGPText.
©CopyrightPivotalSoftware,Inc,2013-2017 13 2.1.2
#chmod775/usr/local/<gptext-version>#chowngpadmin:gpadmin/usr/local/<gptext-version>
4. Editthe gptext_install_config filetosetparametersfortheinstallation.SeeSetInstallationParametersfordetails.
5. RuntheGPTextinstallationbinaryas gpadmin onthemasterserver:
./gptext-version.bin-cgptext_install_config
6. AcceptthePivotallicenseagreement.
SetInstallationParametersAGPTextconfigurationfilenamed gptext_install_config containsparameterstoconfiguretheGPTextinstallation.Editthefileandsettheparametersasdescribedinthefollowingtable.
Table1.GPTextinstallationparameters
Parameter Description Example
GPTEXT_HOSTS
AnarrayofhostnamesonwhichtoinstallGPText,orusetheconstant"ALLSEGHOSTS" toinstallGPTextonallGreenplumDatabasesegmenthosts.GPTexthostsmustbepasswordlessssh-accessiblebythegpadminuserfromallotherhostsintheGreenplumCluster.
declare -a GPTEXT_HOSTS=(gptext_host1 gptext_host2 gptext_host3)
GPTEXT_HOSTS="ALLSEGHOSTS"
DATA_DIRECTORY
AnarrayofdirectorypathswhereGPTextdatadirectoriesaretobecreated.ThenumberofdirectoriesinthearraydeterminesthenumberofGPTextnodesthatwillbecreatedoneachphysicalhost.If GPTEXT_HOSTS listsmultipleinterfacesperhost,theGPTextnodesarespreadevenlyacrosstheinterfaceaddresses.
declare -a DATA_DIRECTORY=(/data/primary /data/primary)
JAVA_OPTSSetstheminimumandmaximummemoryeachSolrCloudJVMcanuse.
JAVA_OPTS="-Xms1024M -Xmx2048M"
GPTEXT_PORT_BASE
GP_MAX_PORT_LIMIT
SetarangeofportnumbersavailabletoGPTextnodes.GPTextfindsunusedportsinthespecifiedrange.
GPTEXT_PORT_BASE=18983GP_MAX_PORT_LIMIT=28983
ZOO_CLUSTER
WhethertodeployaGPTextbindingZooKeeperclusteroruseanexistingZooKeepercluster.Ifsetto "BINDING"theinstallationdeploysaZooKeepercluster.TouseanexistingZooKeepercluster,setthisparametertoalistofZooKeepernodesintheformat"host1:port,host2:port,host3:port “.
ZOO_CLUSTER="BINDING"
If ZOO_CLUSTER issetto "BINDING" ,thisparameterisanarrayofthehostswheretheZooKeepernodesaretobeinstalled.
The GPTEXT_HOSTS and DATA_DIRECTORY installationparametersdeterminethenumberofGPTextnodesthataredeployed.Thenumberofdirectoriesincludedinthe DATA_DIRECTORY arrayisthenumberofGPTextnodesthatarecreatedperhost.The GPTEXT_HOSTS parameterdeterminesthenumberofhosts.Ifsettotheconstant "ALLSEGHOSTS" thenumberofGPTextnodehostsisthesameasthenumberofGreenplumsegmenthosts.If GPTEXT_HOSTS issettoanarrayofhostnames,thelengthofthearrayisthenumberofGPTextnodehosts.ThetotalnumberofGPTextnodesshouldnotexceedthenumberofGreenplumDatabaseprimarysegments.
©CopyrightPivotalSoftware,Inc,2013-2017 14 2.1.2
ZOO_HOSTS
Thearraymustcontain3,5,or7hostnames,forexampleZOO_HOSTS=(sdw1 sdw2 swd3 sdw4sdw5)
.IfyouareusingasinglehostforZooKeeper,specifyitmultipletimes,forexample, ZOO_HOSTS=(sdw1 sdw1 swd1) .
declare -a ZOO_HOSTS=(localhost localhost localhost localhostlocalhost)
ZOO_DATA_DIRTheZooKeeperdatadirectory,requiredwhen ZOO_CLUSTER issetto "BINDING" .
ZOO_DATA_DIR="/data/master/"
ZOO_GPTXTNODE
ThenodepathinZooKeeperforGPText.ThisparameterisrequiredwhetherZOO_CLUSTER issetto "BINDING" oralistofhosts.
ZOO_GPTXTNODE="gptext"
ZOO_PORT_BASE
ZOO_MAX_PORT_LIMIT
ArangeofportnumberstousefortheZooKeepercluster.Unusedportsareallocatedfromwithinthisrange.Therangemustcontainatleast4000portnumbers.
ZOO_PORT_BASE=2188ZOO_MAX_PORT_LIMIT=12188
GPTEXT_JAVA_HOME
ThehomedirectoryoftheJavainstallationtorunforZooKeeperandSolrprocesses.Ifnotset,theJREspecifiedinthe PATH andJAVA_HOME environmentvariableswillbeused.
GPTEXT_JAVA_HOME=/usr/java/jdk1.8.0_131
Parameter Description Example
StartingGPTextFirst,makesuretheGPTextcommand-lineutilitiesareinyourpathbysourcingtheGreenplumDatabaseandGPTextenvironmentscripts.ItisimportanttosourcetheGPTextenvironmentscripteachtimeyousourcetheGreenplumDatabasescript.Forexample:
source/usr/local/greenplum-db-<version>/greenplum_path.shsource/usr/local/greenplum-text-<version>/greenplum-text_path.sh
TouseGPTextinadatabase,youmustfirstusethe gptext-installsql managementutilitytoinstalltheGPTextuser-definedfunctionsandotherobjectsinthedatabase:
gptext-installsqldatabase[database2...]
TheGPTextobjectsarecreatedinthe gptext schema.
TheZooKeeperclustermustberunningbeforeyoustartGPText.IfyouinstalledaboundZooKeepercluster,startitwiththe zkManager command-lineutility.
$zkManagerstart
StartGPTextwiththe gptext-start utility.
$gptext-start
UninstallingGPTextTouninstallGPText,runthe gptext-uninstall utility.YoumusthavesuperuserpermissionsonalldatabaseswithGPTextschemastorun gptext-uninstall .
gptext-uninstall runsonlyifthereisatleastonedatabasewithaGPTextschema.
Execute:
gptext-uninstall
©CopyrightPivotalSoftware,Inc,2013-2017 15 2.1.2
©CopyrightPivotalSoftware,Inc,2013-2017 16 2.1.2
UpgradingGPTextUpgradingaGPTextsystemtoanewGPTextreleaseinstallsthenewGPTextsoftwarereleaseonallhostsintheGreenplumclusterandthenupgradestheGPTextsystem.
UpgradingGPTextandGreenplumDatabaseattheSameTimeIfyouareupgradingtonewreleasesofGreenplumDatabaseandGPTextatthesametime,followthesesteps:
1. CompletetheGreenplumDatabaseupgradefirstandensurethedatabaseisoperational.
2. RuntheGPText gptext-migrator utilitytomigrateyourcurrentGPTextsystemtothenewlyupgradedGreenplumDatabasesystem.
3. EnsurethatthecurrentversionofGPTextworkswiththenewGreenplumDatabaseversion.
4. ProceedwiththeGPTextupgrade.
UpgradingaGPTextReleaseUpgradingaGPTextreleaseisatwo-partprocess:installthenewsoftwarereleaseontheGreenplumclusterhostsandthenupgradetheexistingGPTextsystem.TheGPTextinstallerperformsthefirstpart,installingthenewsoftware.The gptext-upgrade utilityperformsthesecondpart,upgradingthecurrentGPTextsystemtothenewversion.
TheGPTextinstallerdetectsanexistingGPTextsystemand,afterinstallingthenewsoftwarerelease,offerstorunthe gptext-upgrade utilityforyou.IfyouchoosetoupgradetheGPTextsystemlater,youcanrunthe gptext-upgrade utilityyourself.
AllupgradetasksareexecutedontheGreenplummasterhostasthe gpadmin user.The gpadmin usermusthavewritepermissioninthedirectorywherethenewGPTextreleaseistobeinstalled, /usr/local/greenplum-text-<release>-<version> bydefault.
YoumuststoptheGPTextsystembeforeupgrading.TheGreenplumDatabaseandZooKeeperclustersmustberunning.
Followthesesteps:
1. DownloadthenewGPTextreleaseforyourplatformfromPivotalNetwork .
2. Extractthereleasepackage.
$tarxfzgreenplum-text-<version>-<platform>.tar.gz
3. ShutdowntheGPTextsystem.
$gptext-stop
4. RuntheGPTextinstaller.
$./greenplum-text-<version>-<platform>.bin
5. TheinstallerpromptsyoutoacceptthePivotallicenseagreementandtochooseandcreatetheinstallationdirectory.
6. Theinstallerverifiestheenvironmenttoensurethatprerequisitesarepresent,suchasPythonandJava.Ifanyproblemsarediscovered,theinstalleroutputsanerrormessageandstops.Correcttheproblemidentifiedbythemessageandruntheinstalleragain.
7. AfterthenewsoftwarehasbeeninstalledontheGreenplumcluster,theinstallerlooksforanexistingGPTextinstallation.IfanexistingGPTextsystemisfound,theinstallerasksifyouwishtoupgradeGPTextdirectly.
Ifyouansweryes,theinstallerrunsthe gptext-upgrade script.The gptext-upgrade utilityvalidatestheenvironmenttoensureitcancompletetheupgrade,thenexecutestheupgradeandrestartstheGPTextsystem.Ifanyproblemsarediscovered, gptext-upgrade outputsamessageandquits.Fixtheindicatedproblemsandrunthegptext-upgradeutility(at <NEW_GPTEXTHOME>/bin/gptext-upgrade )tocompletetheGPTextsystemupgrade.
WhenupgradingGPText,youdonotspecifyaninstallationconfigurationfileasyoudofortheinitialGPTextinstallation.
©CopyrightPivotalSoftware,Inc,2013-2017 17 2.1.2
Ifyouanswerno,youmustrunthe gptext-upgrade scriptaftertheinstallercompletes.Seethegptext-upgradeutilityreferenceforinstructions.
8. Aftertheupgradehascompleted,sourcethe greenplum-text_path.sh inthenewGPTextreleasedirectoryandrun gptext-statehealthcheck toverifytheGPTextsystem:
$source/usr/local/greenplum-text-<version>/greenplum-text_path.sh$gptext-statehealthcheck
©CopyrightPivotalSoftware,Inc,2013-2017 18 2.1.2
UsingPivotalGPTextIntroductiontoPivotalGPText
WorkingWithGPTextIndexes
QueryingGPTextIndexes
CustomingGPTextIndexes
AdministeringGPText
GPTextHighAvailability
Glossary
©CopyrightPivotalSoftware,Inc,2013-2017 19 2.1.2
IntroductiontoPivotalGPTextPivotalGPTextenablesprocessingmassquantitiesofrawtextdata(suchassocialmediafeedsore-maildatabases)intomission-criticalinformationthatguidesbusinessandprojectdecisions.GPTextjoinstheGreenplumDatabasemassivelyparallel-processingdatabaseserverwithApacheSolrCloudenterprisesearchandtheMADlibAnalyticsLibrarytoprovidelarge-scaleanalyticsprocessingandbusinessdecisionsupport.GPTextincludesfreetextsearchaswellassupportfortextanalysis.GPTextsupportsbusinessdecisionmakingbyoffering:
Multiplekindsofdata:GPTextsupportsbothsemi-structuredandunstructureddatasearches,whichexponentiallyincreasesthekindsofinformationyoucanfind.
Lessschemadependence:GPTextdoesnotrequirestaticschemastosuccessfullylocateinformation;schemascanchangeorbequitesimpleandstillreturntargetedresults.
Textanalytics:GPTextsupportsanalysisoftextdatawithmachinelearningalgorithms.TheMADlibanalyticslibraryisintegratedwithGreenplumDatabaseandisavailableforusewithGPText.
Thischaptercontainsthefollowingtopics:
GPTextSystemArchitecture
GPTextSampleUseCase
GPTextWorkflow
TextAnalysis
GPTextSystemArchitectureGPTextcombinesaGreenplumDatabaseclusterwithanApacheSolrCloudcluster.GreenplumDatabasesegmentsandGPTextnodescanbedeployedonthesamehostsorondifferenthostswithnetworkconnectivity.
ThefollowingfigureshowstheprocessarchitectureofthecombinedGreenplumDatabaseandApacheSolrclusters.ThefigureshowsfourclusternodeswithfourGreenplumsegmentsandfourSolrinstancesdeployedoneach.AnApacheZooKeeperservicemanagestheSolrCloudcluster.BecauseZooKeeperismostefficientwithanoddnumberofservers,ZooKeepernodesaredeployedonthreeofthefourhosts.GreenplumDatabaseusersaccessSolrCloudservicesviaGPTextuser-definedfunctionsinstalledinGreenplumdatabasesandcommand-lineutilities.
ThefigureomitstheGreenplummasterhost,secondarymaster,andmirrorsegmentsfortheGreenplumprimarysegments.
©CopyrightPivotalSoftware,Inc,2013-2017 20 2.1.2
TheGreenplumsegments,Solrinstances,andZooKeepernodesmayallbedeployedonseparatehostsonthesamenetwork,dependingonapplicationandperformancerequirements.
ThefollowingsectionsdescribehowGPTextintegratesSolrCloudwithGreenplumDatabaseandhowthetwoclustersworktogethertoprovideparalleltextsearchcapabilitiesinGreenplumDatabaseandmaintainhighavailability.
GreenplumDatabaseClusterAGreenplumDatabaseclusteriscomprisedofthefollowingcomponents:
Amasterdatabaseinstance,executingonadedicatedhost,conventionallynamed mdw .(Notillustrated)
Asecondarymasterinstance,onahostconventionallynamed smdw ,actingasawarmstandbyforthemasterinstance.(Notillustrated)
Anarrayofdatabaseprimarysegmentinstancesandmirrorsdeployedonsegmenthosts,byconvention sdw1 through sdwn .AsegmentinstanceisanindependentPostgresdatabaseprocessmanagingaportionofthedistributeddata.Eachsegmenthasamirror(notillustrated)onanotherhostintheclustertoprovideuninterruptedserviceincaseofasegmentorsegmenthostfailure.Thenumberofprimarysegmentsperhostisdeterminedbythehardwareconfiguration—thenumberandtypeofprocessorcores,theamountofphysicalRAM,localstoragecapacity,andnetworkcapacity—aswellasavailabilityandperformancerequirements.
TheGreenplummasterinstancecoordinatestheworkofthesegmentinstances.OptimalperformanceofaGreenplumDatabaseclusterrequiresthatallsegmenthostsbeconfiguredidenticallywiththesamenumberofprimaryandmirrorsegmentsoneach,andwiththedatabasedatadistributedevenlyamongthesegmentinstances.Thefullcapacityofthedatabaseclusterisutilizedwheneverysegmenthostperformsanequalamountofwork.
ApacheSolrCloudApacheSolrisaserverprovidingaccesstoApacheLucenefull-textindexes.ApacheSolrCloudisahighlyavailable,faulttolerantclusterofApacheSolrservers.ThetermGPTextclusterisanotherwaytorefertoaSolrCloudclusterdeployedbyGPTextforusewithaGreenplumDatabasesystem.
ASolrCloudclusteriscomprisedofthefollowingcomponents:
AnApacheZooKeeperclustertomanagetheSolrCloudcluster.SolrCloudusesZooKeepertomanageserverconfigurationandtocoordinatethecluster’sactivities.AZooKeeperclustershouldhaveanoddnumberofservers.GPTextcaninstallaZooKeeperclusterthatisboundtotheGPTextcluster,oritcanshareanexistingZooKeepercluster.IfGPTextinstallstheZooKeepercluster,itcanbemanagedusingGPTextfunctionsandutilities.
MultipleSolrCloudserverinstancesdeployedontheGreenplumsegmenthostsoronotherhostsonthesamenetwork.EachinstanceisaJVMprocessrunningSolrserver.SolrCloudinstancesuselocalstorage,whichmaybethesamelocalstoragevolumesthatstoreGreenplumDatabasedata.ThenumberofSolrCloudinstancesperhostcanbethesameasthenumberofGreenplumprimarysegmentsperhost,butthisisnotarequirement.ThenumberofinstancestoexecuteperhostisspecifiedduringGPTextinstallation.
GPTextprovidesdocumentindexingandsearchcapabilitiesforGreenplumDatabasebyaddinguser-definedfunctions(UDFs)thataccessSolrAPIsfromwithindatabasequeries.
GPTextUDFsperformthefollowingtasks:
createandmanageGPTextindexes
insertdocumentsintoindexes
searchindexes
TherearealsoGPTextUDFsandcommand-lineutilitiestoconfigure,monitor,andmanagetheSolrCloudclusterandtomanagereplicas,SolrCloud’shigh-availabilitymechanism.(Moreonreplicasinthenextsection.)
ParallelisminGPTextIndexingandSearchingSolrClouddistributesdocumentindexesinslicescalledshards.WithGPText,thenumberofshardsforanindexisthesameasthenumberofGreenplumsegments,soeachGreenplumsegmentoperatesonanequalportionoftheindex.EachshardismanagedbyaSolrCloudinstanceandtheshardsaredistributedevenlyamongtheSolrCloudinstances.TheSolrCloudinstanceandGreenplumsegmentarenotrequiredtobeonthesamehost.
HighAvailabilityforGPTextIndexesSolrCloudprovideshighavailabilitybymaintainingreplicasofshardsandprovidingautomaticfailoverifashardfailsorbecomesunavailable.Onereplicaofeachshardistheleadreplicaandanychangestoitareappliedtotheotherreplicas.Thereplicationfactor,whichdeterminesthenumberofreplicasto
©CopyrightPivotalSoftware,Inc,2013-2017 21 2.1.2
maintainforeachshard,issetwhentheindexiscreated.ReplicasmayalsobeaddedordroppedlaterusingGPTextUDFsorcommand-lineutilities.
ZooKeeperdeterminesthelocationsofshardreplicasamongtheSolrnodesandhosts.WhenaddingareplicausingaGPTextUDForcommand-lineutility,anewshardcanbeexplicitlyplacedonaSolrCloudinstance.
GPTextSampleUseCaseForensicfinancialanalystsneedtolocatecommunicationsamongcorporateexecutivesthatpointtofinancialmalfeasanceintheirfirm.Theanalystsusethefollowingworkflow:
1. LoadtheemailrecordsintoaGreenplumdatabase.
2. CreateaSolrindexoftheemailrecords.
3. Runqueriesthatlookfortextstringsandtheirauthors.
4. Refinethequeriesuntiltheypairadummycompanynamewithtopthreeorfourexecutivescorrespondingaboutsuspectoffshorefinancialtransactions.Withthisdata,theanalystscanfocustheinvestigationonspecificindividualsratherthanthethousandsofauthorsintheinitialdatasample.
GPTextWorkflowGPTextworkswithGreenplumDatabaseandApacheSolrCloudtostoreandindexbigdataforinformationretrieval(query)purposes.High-levelworkflowsincludedataloadingandindexing,anddataquerying.
Thistopicdescribesthefollowinginformation:
DataLoadingandIndexingWorkflow
QueryingDataWorkflow
DataLoadingandIndexingWorkflowThefollowingdiagramshowstheGPTextworkflowforloadingandindexingdata.
AllclientinteractionwiththesystemisthroughtheGreenplummasterinstance.
1. LoaddataintoyourGreenplumDatabasesystem.Createadatabasetabletoholddataandthenaddthedatatothetable.Greenplumprovidesparalleldataloadingutilitiesandprotocolsthathelptotransformandloadexternaldatainvariousformatsandfromvarioussources.Fordetails,seetheGreenplumDatabaseAdministratorGuide,athttp://gpdb.docs.pivotal.io .
2. CreateanemptyGPTextindex.
©CopyrightPivotalSoftware,Inc,2013-2017 22 2.1.2
Usethe gptext.create_index() user-definedfunction(UDF)tocreateanemptyGPTextindexforthetable.EachGreenplumsegmentwillmanageasliceoftheindex,calledashard.SolrCloudcreatesmultiplereplicasforeachshard,distributedamongtheSolrinstances,andchoosesaleadreplicafortheGreenplumsegmenttooperateupon.Solrmanagesreplicationbetweenthereplicas.
3. Populatetheindexwithdatafromthedatabasetable.Usethe gptext.index() UDFtoadddatatotheindex.ThisUDFworksbydispatchingaSQLquerytoexecuteoneachGreenplumsegment.ThesegmentsexecutethequeryandaddtheresultstotheirshardsusingSolrAPIs.
4. Commitchangestotheindex.CommitchangestotheGPTextindexbycallingthe gptext.commit_index() UDF.Untilthechangesarecommitted,queriesexecutedontheindexcannotaccessanydataaddedtotheindexwith gptext.index() .Ifneeded,uncommittedchangescanberolledback.SolrCloudreplicateschangescommittedtotheleadreplicatotheshards’non-leadreplicas.
QueryingDataWorkflowThefollowingdiagramshowsthehigh-levelGPTextqueryprocessworkflow:
1. AusersubmitsaSQLquerydesignedtosearchtheindexeddata.AGPTextsearchqueryisaSQL SELECT statementonaGPTextsearchUDFthatcontainsfull-textsearchexpressions.
2. TheGreenplummasterdispatchesthequerytotheGreenplumsegments.
3. Eachsegmentexecutesthequery,usingtheSolrAPItosearchitsindexshard.SolrCloudexecutesthesearchqueryontheleadreplicafortheshard.
4. TheGreenplumsegmentsreturntheresultsofthesearchquerytotheGreenplummaster.
5. TheGreenplummasteraggregatestheresultsfromallsegmentsandreturnsthemtotheclient.
TextAnalysisGPTextenablesanalysisofSolrindexeswithApacheMADlib,anopensourcelibraryforscalablein-databaseanalytics.MADlibprovidesdata-parallelimplementationsofmathematical,statistical,andmachinelearningmethodsforstructuredandunstructureddata.YoucanuseGPTexttoperformavarietyofMADlibanalyses.
LearnmoreaboutApacheMADlibathttp://madlib.apache.org .A gppkg packageforMADlibisavailableonthePivotalnetworkathttp://network.pivotal.io .
©CopyrightPivotalSoftware,Inc,2013-2017 23 2.1.2
WorkingWithGPTextIndexesIndexingpreparesdocumentsfortextanalysisandfastqueryprocessing.ThistopicshowsyouhowtocreateGPTextindexesandadddocumentstothem,andtomaintainandcustomizeindexesforyourownapplications.
ToworkwithGPTextindexes,youmustfirstsetupyourenvironmentandaddtheGPTextschematothedatabasecontainingthedocuments(GreenplumDatabasedata)youwanttoindex.
SettheGPTextcommand-lineenvironmentTosettheenvironment,loginasthe gpadmin userandsourcetheGreenplumDatabaseandGPTextenvironmentscripts.TheGreenplumDatabaseenvironmentmustbesetbeforeyousourcetheGPTextenvironmentscript.Forexample,ifbothGreenplumDatabaseandGPTextareinstalledinthe/usr/local/ directory,enterthesecommands:
$source/usr/local/greenplum-db-<version>/greenplum_path.sh$source/usr/local/greenplum-text-<version>/greenplum-text_path.sh
Withtheenvironmentnowset,youcanaccesstheGPTextcommand-lineutilities.
AddtheGPTextschematoadatabaseUsethe gptext-installsql utilitytoaddtheGPTextschematodatabasescontainingdatayouwanttoindexwithGPText.Youperformthistaskonetimeforeachdatabase.Inthisexample,the gptext schemaisinstalledintothe wikipedia database.Enterthefollowingcommandstocreatethe wikipediadatabaseandinstalltheGPTextschema.
$createdbwikipedia$gptext-installsqlwikipedia
The gptext schemaprovidesuser-definedtypes,tables,views,andfunctionsforGPText.ThisschemaisreservedforGPText.Ifyoucreateanynewobjectsinthe gptext schema,theywillbelostwhenyoureinstalltheschemaorupgradeGPText.
CreatingGPTextIndexesandIndexingDataThegeneralstepsforcreatingaGPTextindexandindexingdocumentsare:
1. CreateanemptySolrindex
2. Populatetheindex
3. Committheindex
Afteryoucompletethesesteps,youcancreateandexecuteasearchqueryorimplementmachinelearningalgorithms.SearchingGPTextindexesisdescribedintheQueryingGPTextIndexestopic.
ThefollowingstepsarecompletedbyexecutingSQLcommandsandGPTextfunctionsinthedatabase.RefertotheGPTextFunctionReferencefordetailsabouttheGPTextfunctionsdescribedinthefollowingexamples.
Toopenaninteractiveshellforexecutingqueriesinthe wikipedia database,execute:
$psqlwikipedia
Theexamplesinthissectionuseatablecalled articles inthe wikipedia database.Thetableiscreatedinthedefault public schemawiththefollowingstatement.
©CopyrightPivotalSoftware,Inc,2013-2017 24 2.1.2
CREATETABLEarticles(idint8primarykey,date_timetimestamptz,titletext,contenttext,refstext)DISTRIBUTEDBY(id);
CreateanemptyGPTextindexAGPTextindexisanApacheSolrcollectioncontainingdocumentsaddedfromaGreenplumDatabasetable.TherecanbeoneGPTextindexperGreenplumDatabasetable.EachrowinthedatabasetableisadocumentthatcanbeaddedtotheGPTextindex.
Ifthedatabasetableispartitioned,thereisoneGPTextindexforallpartitions.Youmustspecifytheroottablenamewhencreatingtheindexandaddingdocumentstoit.GPTextprovidessearchsemanticsthatenablesearchingpartitionsefficiently.
The gptext.create_index() functioncreatesanewGPTextindex.Thisfunctionhastwosignatures:
gptext.create_index(<schema_name>,<table_name>,<id_col_name>,<def_search_col_name>[,<if_check_id_uniqueness>])
or
gptext.create_index(<schema_name>,<table_name>,<p_columns>,<p_types>,<id_col_name>,<def_search_col_name>[,<if_check_id_uniquess>])
The <schema_name> and <table_name> argumentsspecifythedatabasetablethatcontainsthesourcedocuments.
The <id_col_name> argumentisthenameofthetablecolumnthatcontainsauniqueidentifierforeachrow.The <id_col_name> columncanbeoftypeint4 , int8 , varchar , text ,or uuid .
The <def_search_col_name> argumentisthenameofthetablecolumnthatcontainsthecontentyouwanttosearchbydefault.Forexample,ifyouwanttoindexandsearchjustthe <content> column,youcanusethefirstsignatureandspecifythe content columnnameinthe <def_search_col_name> argument:
SELECT*FROMgptext.create_index('public','articles','id','content');
Thefinal,optionalargument, <if_check_id_uniqueness> ,isaBooleanargument.Whentrue,thedefault,attemptingtoaddadocumentwithanidthatalreadyexistsintheindexgeneratesanerror.Ifyousettheargumenttofalse,youcanadddocumentswiththesameid,butwhenyousearchtheindexalldocumentswiththesameIDarereturned.
Ifyouwanttoindexandsearchmultiplecolumns,youcanusethesecond gptext.create_index() signature,specifyingthecolumnstoindexinthe<p_columns> argumentandthedatatypesofthosecolumnsinthe <p_types> argument.The <p_columns> and <p_types> argumentsaretextarrays.Theidcolumnanddefaultsearchcolumnmustbeincludedinthearrays.
ThisexamplecreatesanindexyoucanuseyoutorunGPTextsearchesonthe title , content ,and refs columns.
SELECT*FROMgptext.create_index('public','articles','{id,title,content,refs}','{long,text,text,text}','id','content',true);
Notethattheidcolumnanddefaultsearchcolumnarestillspecifiedinseparateargumentsfollowingthearrays.
Youcandownloadtestdataforthistablehere .Right-clickthelinkandsavethefileandthencopyittothegpadminuser’shomedirectory.Loadthedataintothe articles tableusingthe psql\COPY metacommand:
\COPY public.articles(id, date_time, title, content, refs) FROM '/home/gpadmin/articles.csv' HEADER CSV;
Thedatacontains23articlesfromWikipedia.org.
©CopyrightPivotalSoftware,Inc,2013-2017 25 2.1.2
=#SELECT*FROMgptext.create_index('public','articles','{id,title,content,refs}',id,'{long,text,text,text}','content',true);INFO:Createdindexwikipedia.public.articlescreate_index--------------t(1row)
Toverifytheindexiscreated,enterthisquery:
=#SELECT*FROMgptext.index_status('wikipedia.public.articles');content_id|index_name|shard_name|shard_state|replica_name|replica_state|core|node_name|base_url|is_leader|partitioned------------+---------------------------+------------+-------------+--------------+---------------+-------------------------------------------+------------------------------+-------------------------------------+-----------+-------------0|wikipedia.public.articles|shard0|active|core_node2|active|wikipedia.public.articles_shard0_replica1|gpsne.localdomain:18984_solr|http://gpsne.localdomain:18984/solr|t|f0|wikipedia.public.articles|shard0|active|core_node4|active|wikipedia.public.articles_shard0_replica2|gpsne.localdomain:18983_solr|http://gpsne.localdomain:18983/solr|f|f1|wikipedia.public.articles|shard1|active|core_node1|active|wikipedia.public.articles_shard1_replica2|gpsne.localdomain:18983_solr|http://gpsne.localdomain:18983/solr|f|f1|wikipedia.public.articles|shard1|active|core_node3|active|wikipedia.public.articles_shard1_replica1|gpsne.localdomain:18984_solr|http://gpsne.localdomain:18984/solr|t|f(4rows)
ThisexampleexecutedonaGreenplumDatabaseclusterwithtwoprimarysegments.Twoshardswerecreated,oneforeachsegment,andeachshardhastworeplicas.Thereplicasarenamedcore_node1throughcore_node4.
Youcanalsorunthe gptext-state-D
command-lineutilitytoverifytheindexwascreated.Seethegptext-statereferencefordetails.
PopulatetheindexTopopulatetheindex,usethetablefunction gptext.index() ,whichhasthefollowingsyntax:
SELECT*FROMgptext.index(TABLE(SELECT*FROM<table_name>),<index_name>);
Forexample:
SELECT*FROMgptext.index(TABLE(SELECT*FROMarticles),'wikipedia.public.articles');dbid|num_docs------+----------3|112|12(2rows)
Theresultsofthiscommandshowthat23documentsfromtwosegmentswereaddedtotheindex.
Thefirstargumentofthe gptext.index() functionisa“table-valuedexpression.” TABLE(SELECT*FROMarticles) createsa“table-valuedexpression”fromthearticlestable,usingthetablefunction TABLE .
Youcanchoosethedatatoindexorupdatebychangingtheinnerselectlistinthequerytoselecttherowsyouwanttoindex.Whenaddingnewdocumentstoanexistingindex,forexample,specifya WHERE clauseinthe gptext.index() calltochooseonlythenewrowstoindex.
Theinner SELECT statementcouldalsobeaqueryonadifferenttablewiththesamestructure,oraresultsetconstructedwithanarbitrarilycomplexjoin,providedthecolumnsspecifiedinthe gptext.create_index() functionarepresentintheresults.Ifyouindexdatafromasourceotherthanthetableusedtocreatetheindex,besurethedistributionkeyfortheresultsetmatchesthedistributionkeyofthebasetable.TheGreenplumDatabase SELECT
statementhasa SCATTERBY clausethatyoucanusetospecifythedistributionkeyfortheresultsfromaquery.SeeSpecifyingadistributionkeywithSCATTERBYformoreaboutthedistributionpolicyandGPTextindexes.
Committheindex
©CopyrightPivotalSoftware,Inc,2013-2017 26 2.1.2
Afteryoucreateandpopulateanindex,youcommittheindexusing gptext.commit_index(<index_name>) .
Thisexamplecommitsthedocumentsaddedtotheindexinthepreviousexample.
=#SELECT*FROMgptext.commit_index('wikipedia.public.articles');commit_index--------------t(1row)
Anynewdataaddedtotheindexsincethelastcommitiscommittedtotheindex.
ManagingGPTextIndexesGPTextprovidescommand-lineutilitiesandfunctionsyoucanusetoperformtheseGPTextmanagementtasks:
Configuringanindex
Optimizinganindex
SpecifyingadistributionpolicywithSCATTERBY
Deletingfromanindex
Droppinganindex
Addingafieldtoanindex
Droppingafieldfromanindex
Listingallindexes
ConfiguringanindexYoucanmodifyyourindexingbehaviorgloballybyusingthe gptext-config utilitytoeditasetofindexconfigurationfiles.Thefilesyoucaneditwithgptext-config are:
solrconfig.xml –ContainsmostoftheparametersforconfiguringSolritself(seehttp://wiki.apache.org/solr/SolrConfigXml ).
managed-schema –DefinestheanalysischainsthatSolrusesforvariousdifferenttypesofsearchfields(seeTextAnalysisChains).
stopwords.txt –Listswordsyouwanttoeliminatefromthefinalindex.
protwords.txt –Listsprotectedwordsthatyoudonotwanttobemodifiedbytheanalysischain.Forexample,iPhone.
synonyms.txt –Listswordsthatyouwantreplacedbysynonymsintheanalysischain.
elevate.xml –Movesspecificwordstothetopofyourfinalindex.
emoticons.txt –Definesemoticonsforthe text_sm socialmediaanalysischain.(seeTheemoticons.txtfile).
Youcanalsouse gptext-config tomovefiles.
OptimizinganindexThefunction gptext.optimize_index(<index_name>,<max_segments>) mergesallsegmentsintoasmallnumberofsegments( <max_segments> )forincreasedefficiency.
Example:
=#SELECT*FROMgptext.optimize_index('wikipedia.public.articles',10);optimize_index----------------t(1row)
SpecifyingadistributionpolicywithSCATTERBY
©CopyrightPivotalSoftware,Inc,2013-2017 27 2.1.2
Thefirstparameterof gptext.index() isatable-valuedexpression,suchas TABLE(SELECT*FROMarticles) .ThequeryinthisparametermusthavethesamedistributionpolicyasthetableyouareindexingsothatdocumentsaddedtotheindexareassociatedwiththecorrectGreenplumDatabasesegments.Somequeries,however,havenodistributionpolicyortheyhaveadifferentdistributionpolicy.Thiscouldhappenifthequeryisajoinoftwoormoretablesoraqueryonanintermediate(staging)tablethatisdistributeddifferentlythanthebasetablefortheindex.
Tospecifyadistributionpolicyforaqueryresultset,theGreenplumDatabaseSELECTstatementhasa“SCATTERBY”clause.
TABLE(SELECT*FROMarticlesSCATTERBY<distrib_id>)
where distrib_id isthesamedistributionkeyusedtodistributethebasetablefortheindex.
DeletingfromanindexYoucandeletefromanindexusingaquerywiththefunction gptext.delete(<index_name>,<query>) .Thisdeletesfromtheindexalldocumentsthatmatchthesearchquery.Todeletealldocuments,usethequery '*' .
Afterasuccessfuldeletion,execute gptext.commit_index(<index_name>) tocommitthechange.
Thisexampledeletesalldocumentscontaining "toxin" inthedefaultsearchfield.
=#SELECT*FROMgptext.delete('wikipedia.public.articles','toxin');delete--------t(1row)
SELECT*FROMgptext.commit_index('wikipedia.public.articles');
Examplethatdeletesalldocumentsfromtheindex:
SELECT*FROMgptext.delete('wikipedia.public.articles','*:*');
SELECT*FROMgptext.commit_index('wikipedia.public.articles');
DroppinganindexYoucancompletelyremoveanindexwiththe gptext.drop_index(<index_name>) function.
Example:
SELECT*FROMgptext.drop_index('wikipedia.public.articles');
AddingafieldtoanindexYoucanaddafieldtoanexistingindexusingthe gptext.add_field() function.Forexample,youcanaddafieldtotheindexafteracolumnisaddedtotheunderlyingdatabasetableoryoucanaddafieldtoindexacolumnthatwasnotspecifiedwhentheindexwascreated.
GPTextmapstheGreenplumDatabasefieldtypetoanequivalentSolrdatatypeautomatically.SeeGPTextandSolrDataTypeMappingsforatableofdatatypemappings.
©CopyrightPivotalSoftware,Inc,2013-2017 28 2.1.2
CREATETABLEarticles(idint8primarykey,date_timetimestamptz,titletext,contenttext,refstext)DISTRIBUTEDBY(id);
SELECT*FROMgptext.create_index('public','articles','id','content',true);...populatetheindex...SELECT*FROMgptext.commit_index('wikipedia.public.articles');
ALTERTABLEarticlesADDnotestext;SELECT*FROMgptext.add_field('wikipedia.public.articles','notes',false,false);SELECT*FROMgptext.reload_index('wikipedia.public.articles');
AddingafieldtoaGPTextindexrequiresthebasetabletobeavailable.Ifyoudropthetableaftercreatingtheindex,youcannotaddfieldstotheindex.
DroppingafieldfromanindexYoucandropafieldfromanexistingindexwiththe gptext.drop_field() function.Afteryouhavedroppedfields,call gptext.reload_index() toreloadtheindex.
Example:
SELECT*FROMgptext.drop_field('wikipedia.public.articles','notes');SELECT*FROMgptext.reload_index('wikipedia.public.articles');
ListingallindexesYoucanlistallindexesintheGPTextclusterusingthe gptext-state command-lineutility.Forexample:
$gptext-state-D20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-ExecuteGPTextstate...20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-Checkzookeeperclusterstate...20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-CheckGPTextclusterstatus...20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-CurrentGPTextVersion:2.1.220170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-Allnodesareupandrunning.20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-Indexstatedetails.20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-databaseindexnamestate20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-wikipediawikipedia.public.articlesGreen20170822:10:11:28:029752gptext-state:gpsne:gpadmin-[INFO]:-Done.
StoringFieldContentinanIndexSolrcanstorethecontentsofcolumnsintheindexsothatresultsofasearchontheindexcanincludethecolumncontents.Thismakesitunnecessarytojointhesearchqueryresultswiththeoriginaltable.Youcanevenstorethecontentsofdatabasecolumnsthatarenotindexedandreturnthatcontentwithsearchresults.GPTextpackstheadditionalfieldsinabufferaddedtothesearchresults.Individualfieldscanberetrievedfromthisbufferusingthegptext.gptext_retrieve_field() , gptext.gptext_retrieve_field_int() ,and gptext.gptext_retrieve_field_float() functions.
OneGPTextdesignpatternistostoreallofatable’scolumnsinaGPTextindex.Allofthetable’scontentcanberetrievedwithGPTextqueries,sothedatabasetablecanthenbetruncatedordroppedtoeliminateredundantdiskstorage.AdditionaldocumentscanbeaddedtotheGPTextindexlaterbyinsertingthemintothetruncatedtable,orintoatemporarytablewiththesamestructure,andthenaddingthemtotheindexwiththe gptext.index()function.
Entriesforeachfieldinthe managed-schema fileforaGPTextindexdetermineifthefieldisindexedorstored.Youcaneditthe managed-schema filetoadd,remove,ormodifytheentriesforeachfield.
Followthesestepstoconfigureanindextostoreadditionalfields.
1. Use gptext-config toeditthe managed-schema file.
gptext-config-iwikipedia.public.articles-fmanaged-schema
©CopyrightPivotalSoftware,Inc,2013-2017 29 2.1.2
2. Findthe <field> elementsforthecolumnsyouwanttostoreintheindex.Notethat <field> elementswithnamesbeginningwithanunderscoreareinternalfieldsandshouldnotbemodified.Thefieldsnamed“id”and“content”inthisexamplecoorespondtothe <id_col> and<def_search_col_name> columnsinthe gptext.create_index() function.
<fieldname="__pk"type="long"indexed="true"stored="true"/><fieldname="__temp_field"type="text"multiValued="true"indexed="true"stored="false"/><fieldname="_version_"type="long"indexed="true"stored="true"/><fieldname="content"type="text"indexed="true"stored="false"/><fieldname="id"type="long"indexed="true"stored="true"/><fieldname="refs"type="text"indexed="true"stored="false"/><fieldname="title"type="text"indexed="true"stored="false"/>
3. Addanew <field> elementforanyadditionaltablecolumnyouwanttostoreintheGPTextindex.Setthe type attributeoftheelementtoatypethatmatchesthedatabasecolumntype.(Thevalidtypesaredefinedby <fieldType> elementsinthesame managed-schema file.)
4. Foreachfieldyouwanttostoreintheindex,changethe stored attributefrom "false" to "true" .
<fieldname="title"stored="true"type="text"indexed="true"/><fieldname="content"stored="true"type="text"indexed="true"/><fieldname="refs"stored="true"type="text"indexed="true"/>```
5. Savethefileand,ifanydocumentswerealreadyaddedtotheindex,reindexthetable.SeeRetrievingAdditionalStoredFieldsforinformationaboutaddingadditionalstoredfieldstotheresultsofaGPTextquery.
Formoreaboutthecontentsofthe managed-schema fileandadditionalwaystocustomizeGPTextindexesseeCustomizingGPTextIndexes.
CreatingaGPTextindexforaGreenplumDatabasepartitionedtableWhenyoucreateaGPTextindexforapartitionedGreenplumDatabasetable,GPTextcreatesanindexwitha __partition field.Whenyouadddocumentstotheindex,GPTextsavesthechildpartitiontablenameinthe __partition field,allowingyoutocreateGPTextqueriesthatsearchandfilterbypartition.
UnlikeGreenplumDatabase,whichmanageschildpartitionsinseparatedatabasetables,GPTextdoesnotcreateaseparateSolrcollectionforeachdatabasepartitionbecausethelargernumberofSolrcorescouldadverselyaffectthecapacityandperformanceoftheSolrcluster.
CreatingaGPTextindexforapartitionedtableandaddingdocumentstotheindexisnodifferentthancreatinganindexforanunpartitionedtable.
Thefollowing CREATETABLE statementcreatesa tweets tablepartitionedbymonth.
CREATETABLEtweets(idbigint,createdtimestamp,is_retweetboolean,retweetsint,tweettext)DISTRIBUTEDBY(id)PARTITIONBYRANGE(created)(START(DATE'2009-01-01')INCLUSIVEEND(DATE'2018-01-01')EXCLUSIVEEVERY(INTERVAL'1month'));
ThisstatementcreatesaGPTextindexforthe tweets table.
SELECT*FROMgptext.create_index('public','tweets','id','tweet',true);
Notethatyoumustsupplythenameoftherootpartition, tweets inthisexample,orthe gptext.create_index() functionwillissueanerrormessage.
Thiscommandaddsrowsfromthe tweets tabletotheGPTextindex.
SELECT*FROMgptext.index(TABLE(SELECT*FROMtweets),'demo.public.tweets');
Committhechangestotheindex:
SELECT*FROMgptext.commit_index('demo.public.tweets');
SeeSearchingPartitionedTablesforexamplesofsearchingpartitionedtables.
©CopyrightPivotalSoftware,Inc,2013-2017 30 2.1.2
QueryingGPTextIndexesToretrievedata,yousubmitaquerythatperformsasearchbasedoncriteriathatyouspecify.Simplequeriesreturnstraight-forwardresults.Youcanusethedefaultqueryparser,orspecifyadifferentqueryparseratquerytime.
CreatingaSimpleSearchQueryAfteraSolrindexiscommitted,youcanrunquerieswiththe gptext.search() function,whichhasthissyntax:
gptext.search(<src_table>,<index_name>,<search_query>,<filter_queries>[,<options>])
ANDsearchexamplewithtop5resultsThissearchfindsdocumentsthatcontainbothsearchterms“solar”and“battery”.The 'rows=5' argumentisaSolroptionthatspecifiesthetop5resultsaretobereturnedfromeachsegment.InaGreenplumDatabaseclusterwithtwosegments,thisqueryreturnsupto10rows.
SELECTa.id,a.date_time,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','solarANDbattery',null,'rows=5')qWHEREq.id=a.idORDERBYscoreDESC;id|date_time|title|score----------+------------------------+---------------------+-----------13690575|2017-08-2402:34:00-05|Solarpower|2.71286582008322|2017-08-0502:09:00-05|Vehicle-to-grid|2.58101534711003|2017-08-1018:56:00-05|Osmoticpower|2.207300725784|2017-08-2607:10:00-05|Renewableenergy|2.1295567213555|2017-08-2712:48:00-05|Solarupdrafttower|2.021064827743|2017-08-2015:56:00-05|Solarenergy|1.6916461608623|2017-08-2703:56:00-05|Ethanolfuel|1.4619896(7rows)
SeeSolroptionsformoreaboutSolroptions.
ORsearchexamplewithtop5resultsByusingtheORkeyword,thissearchmatchesmoredocumentsthantheANDexample.Thetotalnumberofrowsreturnedislimitedbythe rows=5 Solroption.
SELECTa.id,a.date_time,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','solarORbattery',null,'rows=5')qWHEREq.id=a.idORDERBYscoreDESC;id|date_time|title|score---------+------------------------+---------------------+-----------2008322|2017-08-0502:09:00-05|Vehicle-to-grid|2.581015325784|2017-08-2607:10:00-05|Renewableenergy|2.12955672120798|2017-01-2800:59:00-06|Lithiumeconomy|2.0416002213555|2017-08-2712:48:00-05|Solarupdrafttower|2.021064827743|2017-08-2015:56:00-05|Solarenergy|1.6916461608623|2017-08-2703:56:00-05|Ethanolfuel|1.4619896533423|2017-08-2800:52:00-05|Solarwaterheating|1.02390722988035|2017-03-1206:39:00-05|Vortexengine|0.9519546113728|2017-08-1509:59:00-05|Geothermalenergy|0.680103555017|2017-08-2819:24:00-05|Fusionpower|0.6432224(10rows)
Searchnon-defaultfields
©CopyrightPivotalSoftware,Inc,2013-2017 31 2.1.2
AGPTextindexhasadefaultsearchcolumn,specifiedwhentheindexiscreatedwiththe gptext.create_index() function.Ifyouhaveincludedadditionalcolumnstoindex,youcanreferencetheminyourqueries.Thisquerysearchesfordocumentswiththeword“solar”inthe title column.
SELECTa.id,a.date_time,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','title:"solar"',null,null)qWHEREq.id=a.idORDERBYscoreDESC;id|date_time|title|score----------+------------------------+---------------------+-----------13690575|2017-08-2402:34:00-05|Solarpower|1.654772927743|2017-08-2015:56:00-05|Solarenergy|1.6547729533423|2017-08-2800:52:00-05|Solarwaterheating|1.1132113213555|2017-08-2712:48:00-05|Solarupdrafttower|1.1132113(4rows)
Thisexamplesearchesforarticlesthathave“photosynthesis”inthe content columnbutthatdonothave“solar”inthe title column.
SELECTa.id,a.date_time,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','photosynthesisand-title:solar',null,null)qWHEREq.id=a.idORDERBYscoreDESC;id|date_time|title|score----------+------------------------+------------------+-----------25784|2017-08-2607:10:00-05|Renewableenergy|2.972095553716476|2017-08-2820:40:00-05|Seaweedfuel|1.424022114205946|2017-08-2808:46:00-05|Algaefuel|1.3022419608623|2017-08-2703:56:00-05|Ethanolfuel|0.7614042(4rows)
FilteringsearchresultsAfilterqueryappliesafiltertotheresultsreturnedbythequery.The <filter_queries> argumentofthe gptext.search() functionisanarray,soyoucanapplymultiplefilterstothesearchresults.
Thefollowingexamplefindsarticlesthathavetheword“nuclear”inthe content columnandthenappliestwofilterqueriestoremovearticlesthathave“solar”inthe title columnandarticlesthatdonothave“power”inthe title column.
SELECTa.id,a.date_time,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','nuclear','{-title:solar,title:power}',null)qWHEREq.id=a.idORDERBYscoreDESC;id|date_time|title|score----------+------------------------+------------------+------------14090587|2017-08-1414:00:00-05|Low-carbonpower|1.189789755017|2017-08-2819:24:00-05|Fusionpower|1.175360913021878|2017-08-0905:03:00-05|Geothermalpower|0.99499804(3rows)
CreatingaFacetedSearchQueryFacetingbreaksqueryresultsintomultiplecategoriesandshowscountsforeachcategory.
Withthe faceted_field_search() function,thecategoriesarefieldnames.Hereisthesyntaxforthe faceted_field_search() function:
faceted_field_search(index_name,query,filter_queries,facet_fields,facet_limit,minimum)
©CopyrightPivotalSoftware,Inc,2013-2017 32 2.1.2
Inthisexample,thequeryrunsonasetofsocialmediafeedsandeliminatesspamandtruncatedfields:
SELECT*FROMgptext.faceted_field_search('twitter.public.message','*:*',null,'{spam,truncated}',0,0)
Inthisexample,thequerysearchesalltweetsinthedatasetbyusers( author_id s)whohavecreatedatleastfivetweets:
SELECT*FROMgptext.faceted_field_search('twitter.public.message','*:*',null,'{author_id}',-1,5);
HighlightingSearchTermsinQueryResultsThe gptext.highlight() functionhighlightssearchtermsbyinsertingmarkuptagsbeforeandaftereachoccurrenceoftheterminresultsreturnedfromagptext.search() query.Forexample,ifthesearchtermis“iphone”,eachoccurrenceof“iphone”inthedataismarkedup:
<em>iphone</em>
Youcanchangethedefaultmarkupstringsfrom <em> and </em> bysettingthe gptext.hl_pre_tag and gptext.hl_post_tag serverconfigurationparameters.
Theindexmusthavebeencreatedwithtermsenabled.Use gptext.enable_terms() toenabletermvectorsandthereindexthedataifitwasalreadyindexed.See gptext.enable_terms() inthePivotalGPTextUser’sGuide.
Important:Highlightingaddsoverheadtothequery,includingindexspace,indexingtime,andsearchtime.
The gptext.highlight() syntaxis:
gptext.highlight(column_data,column_name,offsets)
The column_data parametercontainsthetextdatathatwillbemarkedupwithhighlightingtags.
The column_name parameteristhenameofthecorrespondingtablecolumn.
The offsets parameterisa gptexthstore typethatcontainskey-valuepairsthatspecifythelocationsofthesearchterminthetextdata.Thisvalueisconstructedbythe gptext.search() functionwhenhighlightingisenabled.Thekeycontainsthecolumnnameandthevalueisacomma-separatedlistofoffsetswherethedataappears.
Toenablehighlightinginagptext.search()query,addthe hl and hl.fl options:
hl=true&hl.fl=field1,field2
Settingthe hl=true optionenableshighlightingforthesearch.The hl.fl optionspecifiesalistofthefieldnamestohighlight.
Example-Top10results,withsearchtermshighlighted
SELECTt.id,gptext.highlight(t.message,'message',s.hs)FROMtwitter.messaget,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message','{!gptextqp}iphone',null,'rows=10&hl=true&hl.fl=message')sWHEREt.id=s.id::int8;
SearchingPartitionedTablesAGPTextindexforapartitionedGreenplumDatabasetablehasa __partition fieldthatcontainsthenameofthechildpartition.Whenyouquerytheindex,youcanusethe __partition fieldtorestrictthepartitionstosearch.
©CopyrightPivotalSoftware,Inc,2013-2017 33 2.1.2
Searchallpartitionsinanindexbycalling gptext.search() withtherootpartitionname:
SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'demo.public.tweets','birthday',null);
Youcansearchasinglepartitionbycalling gptext.search() withthechildpartitionname.Usethe gptext.index_status(<index_name>) functiontoseethepartitionnames.Forexample:
SELECT*FROMgptext.partition_status('demo.public.tweets');partition_name|inherits_name|level------------------------------+--------------------+-------demo.public.tweets_1_prt_1|demo.public.tweets|1demo.public.tweets_1_prt_2|demo.public.tweets|1demo.public.tweets_1_prt_3|demo.public.tweets|1demo.public.tweets_1_prt_4|demo.public.tweets|1demo.public.tweets_1_prt_5|demo.public.tweets|1demo.public.tweets_1_prt_6|demo.public.tweets|1demo.public.tweets_1_prt_7|demo.public.tweets|1demo.public.tweets_1_prt_8|demo.public.tweets|1demo.public.tweets_1_prt_9|demo.public.tweets|1demo.public.tweets_1_prt_10|demo.public.tweets|1...
Thisexamplespecifiesasinglepartitiontosearch:
SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'demo.public.tweets_1_prt_6','birthday',null);
Youcanalsospecifyapartitionnameorarangeofpartitionsinthequeryfilterargumentofthe gptext.search() function.Thisexamplesearchesthetenpartitionsbetween tweets_1_prt_11 and tweets_1_prt_20 .
SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'demo.public.tweets','birthday','{''[__partition:[tweets_1_prt_11TOtweets_1_prt_20]''}');
RetrievingAdditionalStoredFieldsAGPTextindexcanbeconfiguredtostoreadditionalfieldswhendocumentsareindexedwiththe gptext.index() function.TheadditionalstoredfieldscanbereturnedwithsearchresultssothatitisunnecessarytojoinGPTextsearchresultswiththeoriginaltableintheGreenplumDatabase.
SeeStoringAdditionalFieldsinanIndexforinstructionstoconfiguretheindextostoretheadditionalfields.
RetrievetheadditionalfieldvaluesinaGPTextsearchbyspecifyingthelistoffieldsinthe gptext.search() optionsargument.Inthisexample,thewikipedia.public.articles indexisconfiguredtostorethefields id , date_time ,and title .
SELECT*FROMgptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','+grid+economy',null,'fl=id,date_time,title');id|score|hs|rf----------+-----------+----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------13690575|2.1322436||column_value{name:"id"value:"13690575"}column_value{name:"date_time"value:"2017-08-24T08:34:00Z"}column_value{name:"title"value:"Solarpower"}27743|1.9176841||column_value{name:"id"value:"27743"}column_value{name:"date_time"value:"2017-08-20T21:56:00Z"}column_value{name:"title"value:"Solarenergy"}113728|1.3315439||column_value{name:"id"value:"113728"}column_value{name:"date_time"value:"2017-08-15T15:59:00Z"}column_value{name:"title"value:"Geothermalenergy"}14205946|1.0533204||column_value{name:"id"value:"14205946"}column_value{name:"date_time"value:"2017-08-28T14:46:00Z"}column_value{name:"title"value:"Algaefuel"}533423|2.3232973||column_value{name:"id"value:"533423"}column_value{name:"date_time"value:"2017-08-28T06:52:00Z"}column_value{name:"title"value:"Solarwaterheating"}25784|2.2123294||column_value{name:"id"value:"25784"}column_value{name:"date_time"value:"2017-08-26T13:10:00Z"}column_value{name:"title"value:"Renewableenergy"}7906908|1.9307073||column_value{name:"id"value:"7906908"}column_value{name:"date_time"value:"2017-08-28T06:00:00Z"}column_value{name:"title"value:"Biomass"}(7rows)
©CopyrightPivotalSoftware,Inc,2013-2017 34 2.1.2
Toretrieveallfieldsstoredintheindex,usethe * wildcardforthefieldlist: 'fl=*' .
Intheresults,therequestedfieldsarepackedintoanextrafieldnamed rf withtheformat:
column_value{name:"<field1_name>"value:"<field1_value>"}[column_value{name:"<field2_name>"value:"<field2_value>"}]...
TheGPTextfunction gptext.gptext_retrieve_field(rf,<column_name>) retrievesasinglefieldvaluefromthisstructureasatextvalue.Therearevariationstoretrievethefieldvaluesas int or float values.Ifthespecifiedfieldnamedoesnotexistinthe rf structure,thefunctionreturns NULL .Thisexampleshowshowtousethesefunctionstounpackthesearchresultsintoseparatecolumns.
SELECTid,score,gptext.gptext_retrieve_field_int(rf,'id')id,to_timestamp(gptext.gptext_retrieve_field(rf,'date_time'),'YYYY-MM-DDHH24:MI:SS')date,gptext.gptext_retrieve_field(rf,'title')titleFROMgptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','+grid+economy',null,'fl=*');id|score|id|date|title----------+-----------+----------+------------------------+---------------------533423|2.3232973|533423|2017-08-2806:52:00-05|Solarwaterheating25784|2.2123294|25784|2017-08-2613:10:00-05|Renewableenergy7906908|1.9307073|7906908|2017-08-2806:00:00-05|Biomass13690575|2.1322436|13690575|2017-08-2408:34:00-05|Solarpower27743|1.9176841|27743|2017-08-2021:56:00-05|Solarenergy113728|1.3315439|113728|2017-08-1515:59:00-05|Geothermalenergy14205946|1.0533204|14205946|2017-08-2814:46:00-05|Algaefuel(7rows)
UsingAdvancedQueryingOptionsWhenyousubmitaquery,Solrprocessesthequeryusingaqueryparser.ThereareseveralSolrqueryparserswithdifferentcapabilities.Forexample,theComplexPhraseQueryParser canparsewildcards,andthe SurroundQueryParser supportsspanqueries:findingwordsinthevicinityofasearchwordinadocument.
Youcanusethemostappropriateparserforyourquery(seeChangingtheQueryParseratQueryTime).
GPTextsupportsthesequeryparsers:
1. QParserPlugin ,thedefaultGPTextqueryparser. QParserPlugin isasupersetofthe LuceneQParserPlugin ,Solr’snativeLucenequeryparser.QParserPlugin isageneralpurposequeryparserwithbroadcapabilities. QParserPlugin doesnotsupportspanqueriesandhandlesoperatorprecedenceinanunintuitivemanner.Thesupportforfieldselectionisalsoratherweak.Seehttp://wiki.apache.org/solr/SolrQuerySyntax .
2. The ComplexPhraseQueryParser ,whichsupportswildcards,ORs,ranges,andfuzziesinsidephrasequeries.Seehttps://issues.apache.org/jira/browse/SOLR-1604 .
3. The SurroundQueryParser ,whichsupportsthefamilyofspanqueries.Seehttp://wiki.apache.org/solr/SurroundQueryParser .
4. The DisMax (or eDisMax )QueryParser,whichhandlesoperatorprecedenceinanintuitivemannerandisbestsuitedforuserqueries.Seehttp://wiki.apache.org/solr/DisMaxQParserPlugin .
5. TheUnifiedQueryParsercanuseallthequeryparsersincombination.SeeUsingtheUniversalQueryParserformoreinformation.
Agoodgeneralreferenceforqueryparserscanbefoundat:http://www.lucidimagination.com/blog/2009/02/22/exploring-query-parsers .
ChangingtheQueryParseratQueryTimeYoucanchangethequeryparseratquerytimewiththe defType Solroptioninthe gptext.search() functionthatsupportsSolroptions.Forexample,thisqueryusesthe dismax queryparsertoreturnthetop100resultsfor“olympics”,withnofilterquery,andusingthedefaultsearchfield:
Top100results,nofilterquery,defaultsearchfieldused:
©CopyrightPivotalSoftware,Inc,2013-2017 35 2.1.2
SELECTw.title,q.scoreFROMwikipediaw,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','olympics',null,'rows=100&defType=dismax')qWHEREw.id=q.id::int8;
The options parameterincludes‘defType=dismax’.
InvoketheUniversalQueryParserbyinserting {!gptextqp}<searchquery> inthequeryparameterofthe gptext.search() function.
UsingtheSolrlocalParamssyntaxYoucanusetheSolr localParams syntaxwithallGPTextsearchfunctionstochangethequeryparseratquerytimebyreplacingthe <query> termwith'{!type=dismax}<query>' .
Note:Thedefaultqueryparserisspecifiedinthe requestHandler definitionsin solrconfig.xml .Youcanedit solrconfig.xml withthemanagementutilitygptext-config .
UsingtheUniversalQueryParserWiththeUniversalQueryParser,youcanperformsearchesusingfeaturesofanyofthequeryparsersinonesearchstring.InvoketheUniversalQueryParserwiththe gptext.search() function’s search_query parameterinthisformat:
'{!gptextqp}search_query'
Example-ThefollowingqueryillustratesusingtheUniversalQueryParser.
SELECTw.title,q.scoreFROMwikipediaw,gptext.search(TABLE(SELECT1SCATTERBY1),'wikipedia.public.articles','{!gptextqp}mobil*2W(ObamaORClinton)',null)qWHEREw.id=q.id::int8;
Thesearchqueryinthisexampleincludessyntaxforthreequeryparsers:
mobil* –ComplexQuerywithwildcard
2W –ProximityQuery,requestingamaximumoftwowordsproximitybetweenthephrases
Obama OR Clinton –LuceneQuery
©CopyrightPivotalSoftware,Inc,2013-2017 36 2.1.2
CustomizingGPTextIndexesWhenyoucreateaGPTextindex,Solrcreatesasetofconfigurationfilesfortheindex.The managed-schema filecontainsdefinitionsforthefields,fieldtypes,andtextanalyzersthatdefinethecontentsandbehavioraGPTextindex.
AfieldmapsaGreenplumDatabasetablecolumnstoafieldintheGPTextindex.
AfieldtypeassignsSolrclassesthathandleadatatypetoafield.
AnanalyzerchainisasetofJavaclassesthattokenizeandfilterthecontentofafieldtobeindexed.
Thistopicprovidesanoverviewofthecontentsofthe managed-schema fileandtherelationshipsbetweentheXMLelementsthatdefinefields,fieldtypes,ananalyzerchains.Byeditingthe managed-schema file,youcanspecifyatthefieldlevelhowGreenplumDatabasedataisstoredandindexed.
Fordetaileddocumentation,refertothecommentsinthe managed-schema fileortotheApacheSolrClouddocumentation.
Editingthemanaged-schemaFileYoucaneditconfigurationfilesusingthe gptext-config utility.Thefollowingcommandloadsthe managed-schema fileforanindexintoaneditor:
$gptext-config-i<index-name>-fmanaged-schema
FieldElementsGPTextadds field elementstothe managed-schema fileforcolumnsincludedwhentheindexwascreatedwiththe gptext.create_index() function.Thisexampleisthedefinitionforatextfieldnamed description :
<fieldname="description"stored="false"type="text_intl"indexed="true"/>
The name attributeisthenameofthedatabasecolumn.IfthecolumnnameisnotavalidSolrfieldname,itisalteredtoconform.
The stored attributedeterminesifthecontentofthefieldwillbestoredintheindex.Ifthefieldisstoredintheindex,GPTextsearchresultscanreturnthecontentofthefield.Iftheattributeisnotstored,retrievingthefieldcontentrequiresaSQLjoin.
The type attributemapstheGreenplumDatabasetypetoaSolrtype,definedinthesamefilewitha <fieldType> element.
The indexed attributedetermineswhetherthefieldcontentwillbeindexed.
The <field> elementcanhaveadditionalattributesusedwithsometypes.Seethecommentafterthe <fields> elementforacompletelistofattributes.
FieldTypesThe type attributeofthe <field> elementdetermineshowSolrstoresthefieldinindexes.The class attributemapsthefieldtypetoaSolrJavaclassthatrecognizesandprocessesthedatatype.Solrincludesmanybasefieldtypes.SeeGPTextandSolrDataTypeMappingsforamappingofSolrtypestoGreenplumDatabasetypes.Ifyouhaveacustomtype,youcanaddnewfieldtypesbyimplementingSolrJavatypeinterfaces,oryoucanspecifyanexistingbasetypeandcustomizeitwithananalyzerchain,asdescribedinthenextsection.
AnalyzerChainsTextanalysischainsdeterminehowSolrindexestextfieldsinadocument.AnanalyzerchainisasetofJavaclassesthattokenizeandfilterthecontentbeforeitisaddedtotheindex.Differentanalysischainscanbedefinedforindexingandqueryingoperations.
Fieldanalysisbeginswithatokenizerthatdividesthecontentsofafieldintotokens.InLatin-basedtextdocuments,thetokensarewords(alsocalledterms).InChinese,Japanese,andKorean(CJK)documents,thetokensarecharacters.
Thetokenizercanbefollowedbyoneormorefiltersexecutedinsuccession.Filtersrestrictthequeryresults,forexample,byremovingunnecessaryterms(“a”,“an”,“the”),convertingtermformats,orbyperformingotheractionstoensurethatonlyimportant,relevanttermsappearintheresultset.Eachfilteroperatesontheoutputofthetokenizerorfilterthatprecedesit.Solrincludesmanytokenizersandfiltersthatallowanalyzerchainstoprocessdifferentcharactersets,languages,andtransformations.SeeAnalyzers,TokenizersandFilters-Thefulllist foracomprehensivelist.
©CopyrightPivotalSoftware,Inc,2013-2017 37 2.1.2
Fieldtypesareassignedanalyzersinanindex’s managed-schema file.ThefollowingexampleshowstheSolr text fieldtypespecification:
<fieldTypename="text"class="solr.TextField"positionIncrementGap="100"autoGeneratePhraseQueries="true"><analyzertype="index"><tokenizerclass="solr.WhitespaceTokenizerFactory"/><!--inthisexample,wewillonlyusesynonymsatquerytime<filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="false"/>--><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="1"catenateNumbers="1"catenateAll="0"splitOnCaseChange="1"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="solr.PorterStemFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="solr.WhitespaceTokenizerFactory"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="0"catenateNumbers="0"catenateAll="0"splitOnCaseChange="1"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="solr.PorterStemFilterFactory"/></analyzer></fieldType>
Ananalyzerhasonlyonetokenizer, solr.WhitespaceTokenizerFactory inthisexample.Thetokenizercanbefollowedbyoneormorefiltersexecutedinsuccession.
Filtersrestrictthequeryresults.Eachfilteroperatesontheoutputofthetokenizerorfilterthatprecedesit.Forexample,the solr.StopFilterFactory filterremovesunnecessaryterms(“a”,“an”,“the”)fromthestreamoftokens.Thewordstofilteroutofthestreamarelistedinthe stopwords.txt configurationfile.Youcaneditthe stopwords.txt filewiththe gptext-config utilitytochangethelistofwordsexcludedfromtheindex.
Thereareseparateanalyzertypesforindexandqueryoperations.Thequeryanalyzerchaininthisexampleincludesa solr.SynonymFilterFactory thatlooksupeachtokeninafile synonyms.txt and,iffound,returnsthesynonyminplaceofthetoken.
Theanalysischaincanincludea“stemmer”, solr.PorterStemFilterFactory inthisexample.Thestemmeremploysanalgorithmtochangewordstotheir“stems”.Forexample,“confidential”,“confidentiality”,and“confidentis”areallstemmedto“confidenti”.Usingastemmercandramaticallyreducethesizeoftheindex,butusersexecutingsearchesshouldbeawarethatsomesearchexpressionswillnotworkasexpectedbecauseofstemming.Forexample,searchingwithawildcardsuchas "confidential*" willreturnnomatchesbecausethewordswerestemmedto“confidenti”duringindexing.Withoutawildcard,thewordinthesearchexpressionisalsostemmedandthereforethesearchsucceeds.
GPTextTextAnalysisChainsInadditiontothetextanalysischainsSolrprovides,GPTextprovidesthefollowingtextanalysischains:
text_intl,theInternationalTextAnalyzer
text_sm,theSocialMediaTextAnalyzer
text_intl,theInternationalTextAnalyzertext_intl isthedefaultGPTextanalyzer.Itisamultiplelanguagetextanalyzerfor text fields.IthandlesLatin-basedwordsandChinese,Japanese,andKorean(CJK)characters.
text_intl processesdocumentsasfollows.
1. SeparatesCJKcharactersfromotherlanguagetext.
2. Identifiescurrencytokensorsymbolsthatwereignoredinthefirstpass.
3. ForanyCJKcharacters,generatesabigramfortheCJKcharacterand,forKoreancharactersonly,preservestheoriginalword.
NotethatCJKandnon-CJKtextaretreatedasseparatetokens.PreservingtheoriginalKoreanwordincreasesthenumberoftokensinadocument.
FollowingisthedefinitionfromtheSolr managed-schema template.
©CopyrightPivotalSoftware,Inc,2013-2017 38 2.1.2
<fieldTypeautoGeneratePhraseQueries="true"class="solr.TextField"name="text_intl"positionIncrementGap="100">
<analyzertype="index"><tokenizerclass="com.emc.solr.analysis.worldlexer.WorldLexerTokenizerFactory"/><filterclass="solr.CJKWidthFilterFactory"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="com.emc.solr.analysis.worldlexer.WorldLexerBigramFilterFactory"han="true"hiragana="true"katakana="true"hangul="true"/><filterclass="solr.StopFilterFactory"enablePositionIncrements="true"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="solr.PorterStemFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="com.emc.solr.analysis.worldlexer.WorldLexerTokenizerFactory"/><filterclass="solr.CJKWidthFilterFactory"/><filterclass="com.emc.solr.analysis.worldlexer.WorldLexerBigramFilterFactory"han="true"hiragana="true"katakana="true"hangul="true"/><filterclass="solr.StopFilterFactory"enablePositionIncrements="true"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="solr.PorterStemFilterFactory"/></analyzer></fieldType>
Followingaretheanalysisstepsfor text_intl .
1. Theanalyzerchainforindexingbeginswithatokenizercalled WorldLexerTokenizerFactory .Thistokenizerhandlesmostmodernlanguages.ItseparatesCJKcharactersfromotherlanguagetextandidentifiesanycurrencytokensorsymbols.
2. The solr.CJKWidthFilterFactory filternormalizestheCJKcharactersbasedoncharacterwidth.
3. The solr.LowerCaseFilterFactory filterchangesallletterstolowercase.
4. The WorldLexerBigramFilterFactory filtergeneratesabigramforanyCJKcharacters,leavesanynon-CJKcharactersintact,andpreservesoriginalKorean-languagewords.Setthe han , hiragana , katakana ,and hangul attributesto "true" togeneratebigramsforallsupportedCJKlanguages.
5. The solr.StopFilterFactory removescommonwords,suchas“a”,“an”,and“the”,whicharelistedinthe stopwords.txt configurationfile(seeToconfigureanindex).Iftherearenowordsinthe stopwords.txt file,nowordsareremoved.
6. The solr.KeywordMarkerFilterFactory markstheEnglishwordstoprotectfromstemming,usingthewordslistedinthe protwords.txtconfigurationfile(seeToconfigureanindex).If protwords.txt doesnotcontainalistofwords,allwordsinthedocumentarestemmed.
7. Thefinalfilteristhestemmer,inthiscase solr.PorterStemFilterFactory ,afaststemmerfortheEnglishlanguage.
Note:The text_intl analyzerchainforqueryingisthesameasthe text analyzerchainforindexing.
Ananalysischain, text ,isincludedinGPText’sSolr managed-schema andisbasedonSolr’sdefaultanalysischain.Becauseitstokenizersplitsonwhitespace, text cannotprocessCJKlanguages:whitespaceismeaninglessforCJKlanguages.Bestpracticeistousethe text_intl analyzer.
Forinformationaboutusingananalyzerchainotherthanthedefault,seeUsingthetext_smSocialMediaAnalyzer.
GPTextLanguageProcessing
Theroot-leveltokenizer, WorldLexerTokenizerFactory ,tokenizesinternationallanguages,includingCJKlanguages. WorldLexerTokenizerFactory tokenizeslanguagesbasedontheirUnicodepointsand,forLatin-basedlanguages,whitespace.
Note:UnicodeistheencodingforalltextintheGreenplumDatabase.
Thefollowingaresampleinputto,andoutputfrom,GPText.Eachlineintheoutputcorrespondstoaterm.
EnglishandCJKinput:
₩10대부분english자선단체는.
EnglishandCJKoutput:
₩10
©CopyrightPivotalSoftware,Inc,2013-2017 39 2.1.2
대부분
대부
부분
english
자선
단체는
단체
체는
Bulgarianinput:
Cъставнаnарламента:вж.nротоколи
Bulgarianoutput:
cъстав
на
nарламента
вж
протоколиа
Danishinput:
Genoptagelseafsessionen
Danishoutput:
genoptagelse
af
sessionen
text_intlFilters
Thetext_intlanalyzerusesthefollowingfilters:
The CJKWidthFilterFactory normalizeswidthdifferencesinCJKcharacters.Thisfilternormalizesallcharacterwidthstofullwidth.
The WorldLexerBigramFilterFactory filterformsbigrams(pairs)ofCJKtermsthataregeneratedfrom WorldLexerTokenizerFactory .Thisfilterdoesnotmodifynon-CJKtext.WorldLexerBigramFilterFactory acceptsattributesthatguidethecreationofbigramsforCJKscripts.Forexample,iftheinputcontainsHANGULscriptbutthe hangul attributeissetto false, thisfilterwillnotcreatebigramsforthatscript.Toensurethat WorldLexerBigramFilterFactory createsbigramsasrequired,settheCJKattributes han , hiragana , katakana ,and hangul to true .
text_sm,theSocialMediaTextAnalyzerTheGPText text_sm textanalyzeranalyzestextfromsourcessuchassocialmediafeeds. text_sm consistsofatokenizerandtwofilters.Toconfigurethetext_sm textanalyzer,usethe gptext-config utilitytoeditthe managed-schema file.SeeTousethetext_smSocialMediaAnalyzerfordetails.
text_sm normalizesemoticons:itreplacesemoticonswithtextusingthe emoticons.txt configurationfile.Forexample,itreplacesahappyfaceemoticon,:-) ,withthetext“happy”.
ThefollowingisthedefinitionfromtheSolr managed-schema template.
©CopyrightPivotalSoftware,Inc,2013-2017 40 2.1.2
<fieldTypeautoGeneratePhraseQueries="true"class="solr.TextField"name="text_sm"positionIncrementGap="100"termVectors="true"termPositions="true"termOffsets="true"><analyzertype="index"><tokenizerclass="com.emc.solr.analysis.text_sm.twitter.TwitterTokenizerFactory"delimiter="\t"emoticons="emoticons.txt"/><!--Caseinsensitivestopwordremoval.AddenablePositionIncrements=trueinboththeindexandqueryanalyzerstoleavea'gap'formoreaccuratephrasequeries.--><filterclass="solr.StopFilterFactory"enablePositionIncrements="true"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="com.emc.solr.analysis.text_sm.twitter.EmoticonsClassifierFilterFactory"delimiter="\t"emoticons="emoticons.txt"/><filterclass="com.emc.solr.analysis.text_sm.twitter.TwitterStemFilterFactory"/><analyzertype="query"><tokenizerclass="com.emc.solr.analysis.text_sm.twitter.TwitterTokenizerFactory"delimiter="\t"emoticons="emoticons.txt"/><filterclass="solr.StopFilterFactory"enablePositionIncrements="true"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filterclass="com.emc.solr.analysis.text_sm.twitter.EmoticonsClassifierFilterFactory"delimiter="\t"emoticons="emoticons.txt"/><filterclass="com.emc.solr.analysis.text_sm.twitter.TwitterStemFilterFactory"/></analyzer></fieldType>
TheTwitterTokenizer
TheTwittertokenizerextendstheEnglishlanguagetokenizer, solr.WhitespaceTokenizerFactory, torecognizethefollowingelementsasterms.
Emoticons
Hyperlinks
Hashtagkeywords(forexample,#keyword)
Userreferences(forexample,@username)
Numbers
Floatingpointnumbers
Numbersincludingcommas(forexample10,000)
timeexpressions(forexample,9:30)
Thetext_smfilters
com.emc.solr.analysis.socialmedia.twitter.EmoticonsClassifierFilterFactory classifiesemoticonsas happy , sad ,or wink .Itisbasedonthe emoticons.txt file(oneofthefilesyoucaneditwith gptext-config ),andisintendedforfutureuse,suchasinsentimentanalysis.
TheTwitterStemFilterFactory
com.emc.solr.analysis.socialmedia.twitter.TwitterStemFilterFactory extendsthe solr.PorterStemFilterFactory classtobypassstemmingofthesocialmediapatternsrecognizedbythe twitter.TwitterTokenizerFactory .
©CopyrightPivotalSoftware,Inc,2013-2017 41 2.1.2
Theemoticons.txtfile
Thisfilecontainslistsofemoticonsfor“happy,”“sad,”and“wink.”Theyareseparatedbyatabbydefault.Youcanchangetheseparationtoanycharacterorstringbychangingthevalueof delimiter inthesocialmediaanalysischain.Thefollowingisasamplelinefromthe text_sm analyzerchain:
<filterclass="com.emc.solr.analysis.text_sm.twitter.EmoticonsClassifierFilterFactory"delimiter="\t"emoticons="emoticons.txt"/>
Usingthetext_smSocialMediaAnalyzerTheSolr managed-schema filespecifiestheanalyzertousetoindexafield.Thedefaultanalyzeris text_intl .Tospecifythe text_sm socialmediaanalyzer,youusethe gptext-config utilitytomodifytheSolr managed-schema foryourindex.
Thestepsare:
1. Createanindexusing gptext.create_index() .Theindexcontainsalinesimilartothefollowingneartheendofthefile:
<fieldname="text_search_col"indexed="true"stored="false"type="text_intl"/>
The type fieldspecifiestheanalyzertouse. text_intl isthedefault.
2. Use gptext-config toeditthe managed-schema file:
gptext-config-fmanaged-schema-i<index_name>
3. Modifythelineasfollows:
<fieldname="text_search_col"indexed="true"stored="false"type="text_sm"/>
gptext-configfetchesthe managed-schema filefromtheconfigurationfilesdirectoryforyourindexandopensitinthe vi editor.Afteryoueditthefile,saveit,andquit vi ,gptext-configreturnsthefiletoitsoriginalconfigurationfilesdirectory.
UsingMultipleAnalyzerChainsIfyouwanttoindexafieldusingtwodifferentanalyzerchainssimultaneously,youcandothis:
Createanewemptyindex.Thenusethe gptext-config utilitytoaddanewfieldtotheindexthatisacopyofthefieldyouareinterestedin,butwithadifferentnameandanalyzerchain.
Letusassumethatyourindex,asinitiallycreated,includesafieldtoindexnamed mytext .Alsoassumethatthisfieldwillbeindexedusingthedefaultinternationalanalyzer( text_intl ).
Youwanttoaddanewfieldtotheindex’s managed-schema thatisacopyof mytext andthatwillbeindexedwithadifferentanalyzer(saythe text_smanalyzer).Todoso,followthesesteps:
1. Createanemptyindexwith gptext.create_index() .
2. Opentheindex’s managed-schema fileforeditingwith gptext-config .
3. Adda <field> inthe managed-schema foranewfieldthatwilluseadifferentanalyzerchain.Forexample:<fieldindexed="true"name="mytext2"stored="false"type="text_sm"/>
Bydefiningthetypeofthisnewfieldtobe text_sm ,itwillbeindexedusingthesocialmediaanalyzerratherthanthedefault text_intl .
4. Adda <copyField> in managed-schema tocopytheoriginalfieldtothenewfield.Forexample:<copyFielddest="mytext2"source="mytext"/>
5. Indexandcommitasyounormallywould.
Thedatabasecolumn mytext isnowintheindextwicewithtwodifferentanalyzerchains.Onecolumnis mytext ,whichusesthedefaultinternational
©CopyrightPivotalSoftware,Inc,2013-2017 42 2.1.2
analyzerchain,andtheotheristhenewlycreated mytext2, whichusesthesocialmediaanalyzerchain.
UsingDifferentAnalyzerChainsforIndividualFieldsYoucanusedifferentanalyzersforindividualfieldsbyeditingthemanaged-schemaconfigurationfile.Forexample,ifonefieldcontainsEnglishtextandanothercontainsChineselanguagetext,youcanspecifydifferentanalyzersforthetwofields.
ExampleYouhaveatablenamed email_tbl withthefollowingdefinition:
createtableemail_tbl(idbigint,english_contenttext,chinese_contenttext,timestampdate,usernametext,ageint,...)#additionalcolumnsthatarenotindexed
Youwanttoindexthesixcolumnsshown— id , english_content , chinese_content , timestamp , username ,and age .
Forthecolumn english_content ,youwanttousetheEnglishlanguageanalyzercalled“text_en”forthetextsegmentation.
Forthecolumn chinese_content ,youwanttousetheinternationallanguageanalyzernamed“text_intl”.
Herearestepstoimplementthisexample:
1. CreatetheGPTextindexforthetable.
SELECT*FROMgptext.create_index('public','email_tbl','id','english_content');
2. Modifytheanalyzerforeachcolumnin managed-schema .
$gptext-config-idb.public.email_tbl-fmanaged-schema
3. Findtheelementforthe english_content field.
<fieldname="english_content"type="*"indexed="true"stored="true"/>
Changethe type attributeto text_en .
<fieldname="english_content"type="text_en"indexed="true"stored="true"/>
4. Findtheelementforthe chinese_content field.
<fieldname="chinese_content"type="*"indexed="true"stored="true"/>
Changethe type attributeto text_intl .
<fieldname="chinese_content"type="text_intl"indexed="true"stored="true"/>
5. Indexthetable.
SELECT*FROMgptext.index(TABLE(SELECTid,english_content,chinese_content,timestamp,username,ageFROMemail_tbl),'db.public.email_tbl');
6. Committheindex.
SELECT*FROMgptext.commit_index('db.public.email_tbl');
Thefieldtypes text_en and text_intl aredefinedin <fieldType> entriesinthemanaged-schemafileandthenreferencedinthe type attributeofthe
©CopyrightPivotalSoftware,Inc,2013-2017 43 2.1.2
<field> element.
Youcandefineacustomfieldtypebyaddinga <fieldType> entrywithcustomanalyzersandthensettingthefield’s type attributetothenameofthecustomfieldtype.Forexample,thefollowing“text_customize”fieldtypeisacopyofthe“text_en”fieldtypeentrywiththesynonymfiltercommentedoutintheindexanalyzer.Thiscustomfieldtypewillapplythesynonymfiltertoqueries,butnottotheindex.
<fieldTypename="text_customize"class="solr.TextField"positionIncrementGap="100"><analyzertype="index"><tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><!--inthisexample,wewillonlyusesynonymsatquerytime<filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="false"/>--><filterclass="solr.LowerCaseFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"/><filterclass="solr.LowerCaseFilterFactory"/></analyzer></fieldType>
Afieldtypecanalsobecustomizedbyaddinganalyzersaschildelementsofthe <field> element:
<fieldname="english_content"type="text"indexed="true"stored="false"><analyzertype="index"><tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><!--inthisexample,wewillonlyusesynonymsatquerytime<filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="false"/>--><filterclass="solr.LowerCaseFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"/><filterclass="solr.LowerCaseFilterFactory"/></analyzer></field>
©CopyrightPivotalSoftware,Inc,2013-2017 44 2.1.2
AdministeringGPTextGPTextadministrationincludessecurityconsiderations,monitoringSolrindexstatistics,andtroubleshooting.
ChangingGPTextServerConfigurationParametersConfigurationparametersusedwithGPTextarebuilt-intoGPTextwithdefaultvalues.YoucanchangethevaluesfortheseparametersbysettingthenewvaluesinaGreenplumDatabasesession.ThenewvaluesarestoredinZooKeeper.GPTextindexesusethevaluesofconfigurationparameterswhentheyarecreated.Changingconfigurationparametersaffectsnewindexes,butdoesnotaffectexistingindexes.
SeeGPTextConfigurationParametersforacompletelistofconfigurationparameters.
Aone-timeGreenplumDatabaseconfigurationchangeisneededforGreenplumDatabasetoallowsettinganddisplayingGPTextconfigurationvariables.Asthe gpadmin user,enterthefollowingcommandsinashell:
$gpconfig-ccustom_variable_classes-v'gptext'$gpstop-u
ThenconnecttoadatabasethatcontainstheGPTextschemaandexecutethe gptext.version() functiontoexposetheGPTextconfigurationvariables:
twitter=#select*fromgptext.version();
ChangethevaluesofGPTextconfigurationvariablesusingthe SET commandinasessionwithadatabasethatcontainstheGPTextschema.Thefollowingexamplesetsvaluesforthreeconfigurationparametersina psql session:
twitter=#setgptext.idx_buffer_size=10485760;SETtwitter=#setgptext.idx_delim='|';SETtwitter=#setgptext.extension_factor=5;SET
Youcanviewthecurrentvalueofaconfigurationparameterthatyouhavesetusingthe SHOW command:
twitter=#showgptext.idx_delim;gptext.idx_delim------------------|(1row)
SecurityandGPTextIndexesGPTextsecurityisbasedonGreenplumDatabasesecurity.YourprivilegestoexecuteGPTextfunctionsdependonyourprivilegesforthedatabasetablethatisthesourcefortheindex.Forexample,ifyouhaveSELECTprivilegesforatableintheGreenplumdatabase,thenyouhaveSELECTprivilegesforanindexgeneratedfromthattable.
ExecutingGPTextfunctionsrequiresoneofOWNER,SELECT,INSERT,UPDATE,orDELETEprivileges,dependingonthefunction.TheOWNERisthepersonwhocreatedthetableandhasallprivileges.SeetheGreenplumDatabaseAdministratorGuideforinformationaboutsettingprivileges.
CheckingZooKeeperStatusUsethezkManagerutilityfromthecommandlinetochecktheZooKeeperclusterstatus.IftheZookeeprclusterisboundtoGPText,youcanstartandstoptheclusterusingzkManager.
TochecktheZooKeeperclusterstatus,runthefollowingcommand:
zkManagerstate
©CopyrightPivotalSoftware,Inc,2013-2017 45 2.1.2
Theutilityliststhehosts,ports,latency,andfollower/leadermodeforeachZooKeeperinstance.Ifanodeisdown,it’smodeislistedasDown.
IftheZooKeeperclusterwasinstalledbytheGPTextinstaller,thezkManagerutilitycanbeusedtostartorstoptheZooKeepercluster.Tostartthecluster,runthefollowingcommand:
zkManagerstart
TostopZooKeeper,runthiscommand:
zkManagerstop
CheckingSolrCloudStatusYoucancheckthestatusoftheSolrCloudclusterandindexesbyrunningthe gptext-state utilityfromthecommandline.
TocheckthestateoftheGPTextnodesandeachindex,runthe gptext-state utilitywiththe -D ( --details )option:
gptext-state-D
ThiscommandreportsthestatusoftheGPTextnodesandstatusofeachGPTextindex.
Run gptext-statelist toviewjusttheindexes.
The gptext-statehealthcheck commandcheckstheGPTextconfigurationfiles,theindexstatus,requireddiskspace,userprivileges,andindexanddatabaseconsistency.Bydefault,therequireddiskspacecheckpassesifthereisatleast20%diskfree.Youcansetadifferentdiskfreethresholdusingthe--disk_free option.Forexample:
[gpadmin@gpdb-sandbox~]$gptext-statehealthcheck--disk_free=2520160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-ExecutehealthcheckonGPTextcluster!20160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextconfigfiles...20160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextindexstatus...20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforrequireddiskspace...20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforrequireduserprivileges...20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforindexesanddatabaseconsistency...20160629:15:45:27:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:27:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Done.
Seethe gptext-state utilityreferenceforadditionaloptions.
RecoveringGPTextNodesUsethe gptext-recover utilitytorecoverdownGPTextnodes,forexampleafterafailedGreenplumsegmenthostisrecovered.
Withnoarguments,the gptext-recover utilitydiscoversdownGPTextnodesandrestartsthem.
Withthe -f (or --force )option,ifaGPTextnodecannotberestartedandnoshardsaredown,thenodeisdeletedandcreatedagainonthesamehost.Missingreplicasareaddedandthefailednodeandfailedreplicasareremoved.
The -H ( --new_hosts )optionallowsrecreatingdownGPTextnodesonnewhoststhatreplacefailedhosts.ThedownGPTextnodesaredeletedandrecreatedonthenewhosts.Theargumenttothe -H optionisacomma-separatedlistofthenewhoststhataretoreplacethefailedhosts.Thenumberofnewhostsmustmatchthenumberoffailedhosts.Ifshardsaredown,itadvisesreindexing.Ifonlysomereplicasaredown,itrecreatesthereplicasonthenewhostsandupdates gptext.conf .
The -r optionrecoversreplicas,butdoesnotattempttorecoveranydownnodes.
Note:BeforerecoveringGPTextnodesonnewlyaddedhosts,ensurethatthefollowingGPTextprerequisiteshavebeeninstalledonthehost:
OracleJava1.6
©CopyrightPivotalSoftware,Inc,2013-2017 46 2.1.2
Python2.6
TheLinux lsof utility
ViewingSolrIndexStatisticsYoucanviewSolrindexstatisticsbyrunningthe gptext-state utilityfromthecommandline.
TolistallGPTextindexes,enterthefollowingcommandatthecommandline:
gptext-statelist
Acommandlinethatretrievesallstatisticsforanindex:
gptext-state--indexwikipedia.public.articles
Acommandlinethatretrievesthenumberofdocumentsinanindex:
gptext-state--indexwikipedia.public.articles--stats_columns=num_docs
Acommandlinethatretrieves num_docs andtheindex size :
gptext-state--indexwikipedia.public.articles--stats_columnsnum_docs,size
BackingUpandRestoringGPTextIndexesWiththe gptext-backup managementutility,youcanbackupaGPTextindextoashareddirectorysothat,ifneeded,youcanquicklyrecoverfromafailure.ThebackupcanberestoredtothesameGPTextsystemortoanothersystemwiththesamenumberofGreenplumsegments.
The gptext-backup managementutilitybacksupanindexanditsconfigurationfilestoasharedfilesystem,whichmustbeaccessibleandwritablebyeachhostintheGreenplumcluster.The --path command-lineoptionspecifiesthelocationofadirectoryonthemountedfilesystem.The --name optionprovidesanameforthebackup.
The gptext-backup utilityfirstchecksthat:
theGPTextclusterisup
thesharedfilesystemisvalid
thedirectoryspecifiedwiththe --name optiondoesnotalreadyexistatthelocationspecifiedbythe --path option
Theutilitycreatesthenewdirectoryandsavesonecopyofeachindexshardtothatdirectory,alongwiththeindex’sconfigurationfiles.
Torestoreanindex,usethe gptext-restore managementutility.TheGPTextsystemyourestoretomustbeonaGreenplumclusterwiththesamenumberofsegments.Thedatabase,schema,andbasetablefortheindexmustbepresent.
The --index optionspecifiesthenameoftheGPTextindexthatwillberestored.Iftheindexexists,youmustfirstdropitwiththe gptext.delete() user-definedfunction.
The --path optionspecifiesthelocationofthedirectorycontainingthebackupfiles—thedirectorythat gptext-backup createdonthesharedfilesystem.
Seegptext-backupforsyntaxanddetailsforrunning gptext-backup .Seegptext-restoreforsyntaxanddetailsforrunning gptext-backup .
ExpandingtheGPTextClusterThe gptext-expand managementutilityaddsGPTextnodestothecluster.Therearetwowaystoaddnodes:
AddGPTextnodestoexistinghostsinthecluster.ThisoptionincreasesthenumberofGPTextnodesoneachhost.
AddGPTextnodestonewhostsaddedbyusingtheGreenplum gpexpand managementutilitytoexpandtheGreenplumDatabasesystem.
©CopyrightPivotalSoftware,Inc,2013-2017 47 2.1.2
AddingGPTextNodestoExistingSegmentHostsToaddnodestoexistingsegmenthosts,runthe gptext-expand utilitywithacommandlikethefollowing:
gptext-expand-e-p/data1/nodes,/data2/nodes
ThisexampleaddstwoGPTextnodestoeachhost.
The -e ( --existing )optionspecifiesthatnodesaretobeaddedtoexistinghosts.
The -p ( --expand_paths )optionprovidesalistofdirectorieswherethenewnodes’datadirectoriesaretobecreated.TheseshouldbethesamedirectoriesthatcontaintheGreenplumsegmentdatadirectoriesandexistingGPTextdatadirectories.Thenumberofdirectoriesinthelististhenumberofnewnodesthatareadded.
AdirectorycanberepeatedinthedirectorylistmultipletimestoincreasethenumberofnewGPTextnodestocreate.Forexample,ifthereiscurrentlyoneGPTextnodeperhostinthe /data1/nodes directory,youcouldaddthreenodeswithacommandlikethefollowing:
gptext-expand-e-p/data1/nodes,/data2/nodes,/data2/nodes
Thisaddsonenodetothe /data1/nodes directoryandtwonodestothe /data2/nodes directorysotherearetwoGPTextnodesineachdirectory.
AddingGPTextnodesaffectsnewindexes,butnotexistingindexes.Replicasfornewindexeswillbedistributedacrossallofthenodes,includingbotholdnodesandthenewlycreatednodes.Replicasforindexesthatexistedbeforerunning gptext-expand arenotautomaticallymoved.Rebalancingexistingreplicasrequiresreindexing.
AddingGPTextNodestoNewHostsCheckthatthefollowingGPTextprerequisitesareinstalledoneachnewhostaddedtotheGreenplumDatabasecluster:
OracleJava1.8
Python2.6orgreater
Linux lsof utility
NewhostsmustbereachablebyallhostsintheGPTextcluster,includingexistinghostsandthenewhostsyouareadding.
AfterexpandingtheGreenplumclusterwiththe gpexpand managementutility,call gptext-expand withthe -H ( --new_hosts )optionandalistofthenewhostsonwhichtoinstallGPText:
gptext-expand-Hnewhost1,newhost2
The gptext-expand utilityinstallsGPTextbinariesonthenewhostsandthencreatesnewGPTextnodesonthenewhosts.
ExpandingaGreenplumclusterincreasesthenumberofsegments,sothenumberofGPTextindexshardsforexistingindexesmustbeincreasedtoequalthenewnumberofsegments.Thisrequiresreindexingallexistingdocuments.Newlycreatedindexeswillautomaticallybedistributedamongthenewshards.
TroubleshootingGPTexterrorsareofthefollowingtypes:
Solrerrors
gptext errors
MostoftheSolrerrorsareself-explanatory.
gptext errorsarecausedbymisuseofafunctionorutility.Theyprovideamessagethattellsyouwhenyouhaveusedanincorrectfunctionorargument.
MonitoringLogs
©CopyrightPivotalSoftware,Inc,2013-2017 48 2.1.2
YoucanexaminetheGreenplumDatabaseandSolrlogsformoreinformationiferrorsoccur.GreenplumDatabaselogsresidein:
segment-directory/pg-log
Solrlogsresidein:
<GPDBpath>/solr/logs
DeterminingSegmentStatuswithgptext-stateUsethe gptext-state utilitytodetermineifanyprimaryormirrorsegmentsaredown.See gptext-state intheGPTextManagementUtilitiesReference.
©CopyrightPivotalSoftware,Inc,2013-2017 49 2.1.2
GPTextHighAvailabilityTheGPTexthighavailabilityfeatureensuresthatyoucancontinueworkingwithGPTextindexesaslongaseachshardintheindexhasatleastoneworkingreplica.
AGPTextindexhasoneshardforeachGreenplumsegment,sothereisaone-to-onecoorespondencebetweenGreenplumsegmentsandGPTextindexshards.TheshardmanagedbyaGreenplumsegmentisanindexofthedocumentsthataremanagedbythatsegment.
TheGPTexthighavailabilitymechanismistomaintainmultiplecopies,orreplicas,oftheshard.TheZooKeeperservicethatmanagesSolrCloudchoosesaGPTextinstance(SolrCloudnode)foreachreplicatoensureevendistributionandhighavailability.Foreachshard,onereplicaiselectedleaderandtheGreenplumsegmentassociatedwiththeshardoperatesonthisleaderreplica.TheGPTextinstancemanagingtheleadreplicamayormaynotbeonanotherGreenplumhost,soindexingandsearchingoperationsarepassedovertheGreenplumcluster’sinterconnectnetwork.SolrCloudreplicateschangesmadetotheleaderreplicatotheremainingreplicas.
ThefollowingfigureillustratestherelationshipsbetweenGreenplumsegmentsandGPTextindexshardsandreplicas.Theleaderreplica(green)
Thenumberofreplicastocreateforeachshard,thereplicationfactor,isaSolrCloudproperty.Bydefault,GPTextstartsSolrCloudwithareplicationfactorofthree.ThereplicationfactorforeachindividualindexisthevalueoftheSolrCloudreplicationfactorwhentheindexiscreated.Changingthereplicationfactordoesnotalterthereplicationfactorforexistingindexes.
GreenplumSegmentorHostFailureIfaGreenplumprimarysegmentfailsanditsmirrorisactivated,GPTextfunctionsandutilitiescontinuetoaccesstheleaderreplica.Nointerventionisneeded.
Ifahostintheclusterfails,bothGreenplumandGPTextareaffected.MirrorsfortheGreenplumprimarysegmentslocatedonthefailedhostareactivatedonotherhosts.SolrCloudelectsanewleaderreplicaforaffectedshards.BecauseGreenplumsegmentmirrorsandGPTextshardreplicasaredistributedthroughoutthecluster,asinglehostfailureshouldnotpreventtheclusterfromcontinuingtooperate.Theperformanceofdatabasequeriesandindexingoperationswillbeaffecteduntilthefailedhostisrecoveredandtheclusterisbroughtbackintobalance.
ZooKeeperClusterAvailabilitySolrCloudisdependentonaworking,availableZooKeepercluster.ForZooKeepertobeactive,amajorityoftheZooKeeperclusternodesmustbeupandabletocommunicatewitheachother.AZooKeeperclusterwiththreenodescancontinuetooperateifoneofthenodesfails,sincetwoisamajorityofthree.Totoleratetwofailednodes,theclustermusthaveatleastfivenodessothatthenumberofworkingnodesremainingafterthefailureareamajority.Totoleratennodefailures,then,aZooKeeperclustermusthave2*n*+1nodes.ThisiswhyZooKeeperclustersusuallyhaveanoddnumberofnodes.
Thebestpracticeforahigh-availabilityGPTextclusterisaZooKeeperclusterwithfiveorsevennodessothattheclustercantoleratetwoorthreefailednodes.
©CopyrightPivotalSoftware,Inc,2013-2017 50 2.1.2
ManagingGPTextClusterHealthGPTextdocumentindexingandsearchingservicesremainavailableaslongaseachshardofanindexhasatleastoneworkingreplica.Toensureavailabilityintheeventofafailure,itisimportanttomonitorthestatusoftheclusterandensurethatalloftheindexshardreplicasarehealthy.YoucanmonitortheSolrCloudclusterandindexesusingtheSolrCloudDashboardorusingGPTextfunctionsandmanagementutilities.AccesstheSolrCloudDashboardwithawebbrowseronanyGPTextinstancewithaURLsuchas http://sdw3:18983/solr .(TheportnumbersforGPTextinstancesaresetwiththeGPTEXT_PORT_BASE parameterintheinstallationparametersfileatinstallationtime.)
RefertotheApacheSolrClouddocumentationforhelpusingtheSolrCloudDashboard.
MonitoringtheClusterwithGPTextTheGPText gptext-state managementutilityallowsyoutoquerythestateoftheGPTextclusterandindexes.Youcanalsouse gptext.index_status() toviewthestatusofallindexesoraspecifiedindex.
ToseetheGPTextclusterstaterunthe gptext-state command-lineutilitywiththe -d optiontospecifyadatabasethathastheGPTextschemainstalled.
gptext-state-dmydb
TheutilityreportsanyGPTextnodesthataredownandliststhestatusofeveryGPTextindex.Foreachindex,thedatabasename,indexname,andstatusarereported.Thestatuscolumncontains“Green”,“Yellow”,or“Red”:-Green–allreplicasforallshardsarehealthy-Yellow–allshardshaveatleastonehealthyreplicabutatleastonereplicaisdown-Red–noreplicasareavailableforatleastoneindexshard
ToseethedistributionofindexshardsandreplicasintheGPTextcluster,executethisSQLstatement.
SELECTindex_name,shard_name,replica_name,node_nameFROMgptext.index_summary()ORDERBYnode_name;
TolistallGPTextindexes,runthe gptext-statelist command.
gptext-statelist-dmydb
The gptext-statehealthcheck commandchecksthehealthofthecluster.The -f flagspecifiesthepercentageofavailablediskspacerequiredtoreportahealthycluster.Thedefaultis10.
gptext-statehealthcheck-f20-dmydb
See gptext-state intheManagementUtilitiesreferenceforhelpwithadditional gptext-state options.
Thegptext.index_status()user-definedfunctionreportsthestatusofallGPTextindexesoraspecifiedindex.
SELECT*FROMgptext.index_status();
Specifyanindexnametoreportonlythestatusofthatindex.
SELECT*FROMgptext.index_status('mydb.public.messages');
AddingandDroppingReplicasThe gptext-replica utilityaddsordropsareplicaofasingleindexshard.Usethe gptext.add_replica() and gptext.delete_replica() user-definedfunctionstoperformthesametasksfromwithinthedatabase.
Ifareplicaofashardfails,use gptext-replica toaddanewreplicaandthendropthefailedreplicatobringtheindexbackto“Green”status.
gptext-replicaadd-imydb.public.messages-sshard3
Hereistheequivalent,usingthe gptext.add_replica() function:
©CopyrightPivotalSoftware,Inc,2013-2017 51 2.1.2
SELECT*FROMgptext.add_replica('mydb.public.messages',shard3);
ZooKeeperdetermineswherethereplicawillbelocated,butyoucanalsospecifythenodewherethereplicaiscreated:
gptext-replicaadd-imydb.public.messages-sshard3-nsdw3
Inthe gptext.add_replica() function,addthenodenameasathirdargument.
Todropareplica,call gptext.delete_replica() withthenameoftheindex,thenameoftheshard,andthenameofthereplica.Youcanfindthenameofthereplicabycalling gptext.index_status(index_name) .Thenameisintheformat core_noden .Anoptional -o flagspecifiesthatthereplicaistobedeletedonlyifitisdown.
gptext-replicadrop-imydb.public.messages-sshard3-rcore_node4-o
Hereistheequivalentoftheabovecommandusingthe gptext.delete_replica() user-definedfunction.
SELECT*FROMgptext.delete_replica('mydb.public.messages','shard3','cord_node4',true);
©CopyrightPivotalSoftware,Inc,2013-2017 52 2.1.2
GPTextBestPracticesEachGPText/ApacheSolrnodeisaJavaVirtualMachine(JVM)processandisallocatedmemoryatstartup.ThemaximumamountofmemorytheJVMwilluseissetwiththe -Xmx parameterontheJavacommandline.Performanceproblemsandoutofmemoryfailurescanoccurwhenthenodeshaveinsufficientmemory.
OtherperformanceproblemscanresultfromresourcecontentionbetweentheGreenplumDatabase,Solr,andZooKeeperclusters.
ThistopicdiscussesGPTextusecasesthatstressSolrJVMmemoryindifferentwaysandthebestpracticesforpreventingoralleviatingperformanceproblemsfrominsufficientJVMmemoryandothercauses.
IndexingLargeNumbersofDocumentsIndexingdocumentsconsumesdatainSolrJVMmemory.Whentheindexiscommitted,partsofthememoryarereleased,butsomedataremainsinmemorytosupportfastsearch.Bydefault,Solrperformsanautomaticsoftcommitwhen1,000,000documentsareindexedor20minutes(1,200,000milliseconds)havepassed.Asoftcommitpushesdocumentsfrommemorytotheindex,freeingJVMmemory.Asoftcommitalsomakesthedocumentsvisibleinsearches.Asoftcommitdoesnot,however,maketheindexupdatesdurable;itisstillnecessarytocommittheindexwiththe gptext.commit()
user-definedfunction.
Youcanconfigureanindextoperformamorefrequentautomaticsoftcommitbyeditingthe solrconfig.xml filefortheindex:
$gptext-config-fsolrconfig.xml-i<db>.<schema>.<index-name>
The <autoSoftCommit> elementisachildofthe <updateHandler> element.Editthe <maxDocs> and <maxTime> valuestoreducethetimebetweenautomaticcommits.Forexample,thefollowingsettingsperformanautocommitevery100,000documentsor10minutes.
<autoSoftCommit><maxDocs>100000</maxDocs><maxTime>600000</maxTime></autoSoftCommit>
IndexingVeryLargeDocumentsIndexingverylargedocumentscanusealargeamountofJVMmemory.Tomanagethis,youcansetthe gptext.idx_buffer_size configurationparametertoreducethesizeoftheindexingbuffer.
SeeChangingGPTextServerConfigurationParametersforinstructionstochangeconfigurationparametervalues.
ConfigureMaximumJVMHeapSizeEachSolrcorefileconsumesJVMheapmemory.AddingmoreindexesincreasesJVMswappingandgarbagecollectionfrequencysothatittakeslongertocreateindexesandtoloadthecorefileswhenGPTextisstarted.IfyoucontinuetocreateindexeswithoutincreasingtheJVMheap,anoutofmemoryerrorwilleventuallyoccur.
MonitorperformanceatstartupandduringindexcreationandincreasetheJVMsizewhenyoubegintoseedegradedperformance.Youcanalsousetoolssuchasjconsole,includedwiththeJavaDeveloperKit,tomonitorJavaheapusage.Ifgarbagecollectionsareoccurringtoofrequentlyandfreeingtoolittlememory,JVMheapshouldbeincreased.
Usethe -Xmx JVMcommandlineoptiontoincreasetheJVMheapsize.Forexample,this gptext-config commandsetstheJVMmaximumheapto4GB:
$gptext-config-o"-Xmx=4096M"
ManageIndexingandSearchLoadsWithhighindexingorsearchload,JVMgarbagecollectionpausescancausetheSolroverseerqueuetobackup.ForaheavilyloadedGPTextsystem,youcanpreventsomeperformanceproblemsbyschedulingdocumentindexingfortimeswhensearchactivityislow.
©CopyrightPivotalSoftware,Inc,2013-2017 53 2.1.2
TermsQueriesandOutofMemoryErrorsThe gptext.terms() functionretrievestermsvectorsfromdocumentsthatmatchaquery.Anoutofmemoryerrormayoccurifthedocumentsarelarge,orifthequerymatchesalargenumberofdocumentsoneachnode.Otherfactorscancontributetooutofmemoryerrorswhenrunninga gptext.terms() query,includingthemaximummemoryavailabletotheSolrnodes(-Xmxvaluein JAVA_OPTS )andconcurrentqueries.
Ifyouexperienceoutofmemoryerrorswith gptext.terms() youcansetalowervalueforthe term_batch_size GPTextconfigurationvariable.Thedefaultvalueis1000.Forexample,youcouldtryrunningthefailingquerywith term_batch_size setto500.Loweringthevaluemaypreventoutofmemoryerrors,butperformanceoftermsqueriescanbeaffected.
SeeGPTextConfigurationParametersforhelpsettingGPTextconfigurationparameters.
ConfigureFileSystemCachingforZooKeeperGoodSolrperformanceisdependentonfastresponseforZooKeeperrequests.ZooKeeperperformsbestwhenitsdatabaseiscachedsoitdoesnothavetogotodiskforlookups.IfyoufindthatZooKeeperJVMshavefrequentdiskaccesses,lookforwaystoimprovefilecachingormoveZooKeeperdiskstofasterstorage.
TheZooKeeper zkClientTimeout parameteristhetimeaclientisallowedtonottalktoZooKeeperbeforehavingitssessionexpired.
©CopyrightPivotalSoftware,Inc,2013-2017 54 2.1.2
Glossary
Aanalyzer
DefinesthesetoftermsforaSolrindexfield.Ananalyzerconsistsofatokenizerandasetofoptionalfilterstobeappliedtotheinputtext.Forexample,ananalyzercanconsistofaWhiteSpaceTokenizerFactoryfollowedbyaLowerCaseFilterFactoryasafilter.Seealso:taxonomy.
Bbigram
Asequenceoftwoadjacentelementsinatokenstring.Asequenceofthreeconsecutivetokensisatrigramandasequenceofnconsecutivetokensisann-gram.
binaryclassificationTheprocessofsortingdataintooneoftwocategories,forexample,classifyingagiventextaccordingtowhetherthetextisassociatedwithapositiveornegativesentiment.
Classificationproblemswithmorethantwoclassesintowhichthedataistobecategorizedarecalledmulticlassclassificationproblems.
Ccentroid
Inclusteringproblems,acentroidrepresentstheapproximatecenterofacluster.
Acentroiddoesnothavetomapdirectlytoadatapointinthecluster.Forexample,ink-meansclusteringthecoordinatesofacentroidarethemeanofthecoordinatesofdatapoints(documents)pertainingtothatclusterandareconstantlyupdatedasnewdatapointassignmentsaremade.
clusterAsetofidenticaldatapoints.Forexample,ifsomedocumentsaretobegroupedintothreeclusters,theresultofamachinelearningalgorithmisthreeclusters:allthedocumentswithinaparticularclusteraresimilartoeachother,butdifferentfromthedocumentsinotherclusters.Theresultsandthequalityoftheclustersdependonvariousfactors,includingthealgorithmused,theparametersthatwereconfigured,andsetoffeaturesused.
collectionAcomplete,logicalindexinaGPTextorSolrCloudsystem.Alogicalindexiscomposedofmultipleshards.AGPTextcollectionhasoneshardperGreenplumsegment.
configsetTheSolrconfigurationfilesthatdefinethestructureandconfigurationofaGPTextindex,suchas managed-schema and solrconfig.xml .ThesefilesarestoredinZooKeeper.
coreAreplicaofashard,implementedinGPTextandSolrCloudasaLuceneindex.AlsocalledSolrCore.
corpusAcollectionofdocuments.Plural:corpora.
Ddictionary
Alistofuniquewordsortermsfromthedocumentsthatcomprisethevocabularyofthedocumentcollection.
dimensionAgeneralizedtermforafeatureofdata,suchaswordcountsinadocument.Dimensionsaretypicallylargeinnumber.
Ann-dimensionalvectorisavectoraccordingtowhichadocumentcanbeexpressedinann-dimensionalfeaturespace.Forexample,ifyour
©CopyrightPivotalSoftware,Inc,2013-2017 55 2.1.2
dictionarycontainsnuniqueterms,adocumentcouldbeexpressedinann–dimensionalvectorwhereeachpositioncontainsthecountwithwhichaparticulartermfromthedictionaryappearsinthatdocument.Afeaturespacecouldbetheentiredictionaryorcouldbeanotherdictionary(orsetoffeatures)extractedbyusingfeatureselection.
dimensionalityreductionTheprocessofreducingthedimensions(orfeatures)accordingtowhichthedata(ordocuments)isexpressedinafeaturespace.Forexample,selectingterms(orfeatures)fromthedictionarythatappearinmorethankdocumentsintheentirecorpusgivesonesetofreduceddimensions.
Lleadreplica
Areplicathathasbeenelectedleaderforashard.Whendocumentsareindexed,SolrCloudsendsthemtotheleadreplicafortheshardandtheleaddistributesthemtoalloftheshard’sremainingreplicas.Alsocalledleader.
Mmachinelearning
Abranchofartificialintelligencethatfocusesontheconstructionandstudyofsystemsthatcanlearnfromdata.
Nnaturallanguageprocessing
Afieldofstudythatcombinescomputerscience,artificialintelligence,andliguisticstostudyinteractionsbetweencomputersandhumanlanguages.
Oontology
Theformalrepresentationofknowledgeasasetofconcepts(ideas,entities,events)andtheirpropertiesandrelationsaccordingtoasystemofcategorieswithinadomain.Ontologiesprovidethestructuralframeworksfororganizinginformationforfieldssuchasartificialintelligence.Ontologyisnotasynonymfortaxonomy.
Pproximity,termproximity
Asearchthatlooksfordocumentsinwhichtwoormoreseparatelymatchingtermoccurencesarewithinaspecifieddistance(anumberofintermediatewordsorcharacters).
Qqueryparser
Acomponentthatparsestheinputqueriesprovidedforsearch.
Rreplica
Asinglecopyofashard,managedasaSolrcore.Onereplicaforeachshardiselectedastheleadreplica,orleader.Allupdatesandsearchesaredirectedtotheleadreplica;changesarereplicatedfromtheleadtotheremainingreplicas.
©CopyrightPivotalSoftware,Inc,2013-2017 56 2.1.2
Ssentimentanalysis
Classifiesopinionsexpressedintextdocumentsintocategoriessuchas“positive”and“negative”.
shardAlogicalpieceorsliceofacollection.InGPText,thereisoneshardforeachGreenplumDatabasesegment.Ashardismadeupofoneormorereplicas.Onereplicaiselectedtheleadshard,orleader,andupdatestotheleaderarereplicatedtotheotherreplicas.
silhouettecoefficient(SC)Aquantitivemeasureofdataclusteringperformance.SCmeasureshowtightlygroupedallthedataintheclusterare.Itsvaluesrangebetween–1and1.Valuesnear1indicatethatclusteringwasgood,andvaluesnear-1indicatethatclusteringwasnotgoodandthedatapointmusthavebeenassignedtoanothercluster.Valuesnear0indicatethattheclusterassignmentwasambiguous,andthedatapointissomewhereontheboundaryofthecluster.
sparsevectorAvectorwhoseelementsaremostlyzerosorareunpopulated.SeetheGettingStartedwithGPTextGuideformoreinformation.
stemThepartofawordthatiscommontoallitsinflectedvariants(howyoumodifyawordtoexpressitsdifferentgrammaticalcategories,forexample,byconjugatingaverb).Forexample,receives,receiving,andreceivedallderivefromthestem“receiv”.
stemmingTheprocessforreducinganinflectedorderivedwordtoitsstem,base,orrootform.Thestemisnotnecessarilythesameastherootform.Forexample,receives,receiving,andreceivedallderivefromthestem“receiv”;therootformisreceive.
supportvectormachine(SVM)Asupervisedlearningmodelthatclassifiesdatabyanalyzingthedata,recognizingpatternsinthedata,andplacingthedatainspecificclasses.Applicationsincludesentimentanalysis,separatingspamemailfromlegitimateemail,and,iftheSortingHatwereanSVM,determiningtheHousetowhichnewHogwartsstudentsareassigned.
Ttaxonomy
Ahierarchicalsystemofclassification;amethodfordividingterms,concepts,orotherentitiesintoorderedgroupsorcategories.Taxonomiesdifferfromontologiesinthattheyaregenerallyfocused,simpletreerelationships,andontologieshavewider,broaderscopes.
termAdistinctwordorvaluewithinafield.
TF-IDFscoreTermfrequency-inversedocumentfrequency.Anumericstatisticthatreflectshowimportantawordistoasetofdocuments.Thetf-idfscoreincreasesproportionatetothenumberoftimesawordappearsinadocument.
TF-IDFvectorAvectorcontainingtf-idfscores.
tokenUnitsintowhichaninputstringisbroken.Forexample,atokencanbeindividualtermsinabigram(multipleterms)ortrigramthatappearintheinputtext.
tokenizerBreaksastreamoftextintotokensbasedondelimiters,theseparatorsthatspecifythecharacterstoconsiderasthetokenboundaries,orsomeregularexpressions.Forexample,adelimitercouldbeawhitespace.
Tokenizersarenotawareoffieldsinadocument.
tokenfilterTakesastreamoftokensproducedbyatokenizer,examineseachtoken,andeitherpassesthetokenalongordiscardsit.Forexample,atokenfiltermayremovewhitespace,unnecessarywordssuchas“a”,“an”,or“the”,orremovedotsfromacronyms.Tokenfiltersproduceanotherstreamoftokensthatcanbeinputtoothertokenfilters.
U
©CopyrightPivotalSoftware,Inc,2013-2017 57 2.1.2
UniversalQueryParserTheGPTextUniversalQueryParserparsesqueriescontainingexpressionssupportedbyanysupportedqueryparser.
ZZooKeeper
ApacheZooKeeperisaserverproductthatprovidesclustermanagementservicessuchascentralizedconfigurationmanagement,loadbalancing,andfailover.ItisaSolrCloudrequirement.ZooKeeperhandlesleaderelectionsforSolrCloudreplicas.Forhighavailability,ZooKeeperisdeployedasaclusterof3,5,or7nodes.TheZooKeeperclustercanbedeployedwithGPTextonthesameclusterhostsoranexistingZooKeeperclustercanbeused.
©CopyrightPivotalSoftware,Inc,2013-2017 58 2.1.2
GPTextReferencesThisguideprovidesreferenceinformationaboutPivotalGPTextfunctionsandcommand-lineutilities.
GPTextFunctions
ManagementUtilities
GPTextandSolrDataTypeMappings
GPTextConfigurationParameters
©CopyrightPivotalSoftware,Inc,2013-2017 59 2.1.2
GPTextFunctionReferenceThefollowingfunctionsareavailableinPivotalGPText.
Indexinggptext.create_index()–createsanemptyindex.
gptext.index()–populatesanindex.
gptext.commit_index()–finalizesindexoperations.
gptext.enable_terms()–enablestermvectorsandpositionstoallowextractingtermsandtheirpositionsfrom text fields.
ModifyingorDeletinganIndexgptext.add_field()–addsafieldtoanindex.
gptext.delete()–deletesdocumentsmatchingasearchquery.
gptext.drop_field()–deletesafieldfromanindex.
gptext.drop_index()–deletesanindex.
Searchgptext.search()–searchesanindex.
gptext.search_count()–returnsnumberofdocumentsthatmatchsearch.
gptext.gptext_retrieve_field–extractsasinglefieldfromthe rf searchresultcolumnastext.
gptext.gptext_retrieve_field_int–extractsasinglefieldfromthe rf searchresultcolumnandconvertstoaninteger.
gptext.gptext_retrieve_field_float–extractsasinglefieldfromthe rf searchresultcolumnandconvertstoafloat.
gptext.highlight()–returnssearchresultwithsearchtermhighlighted.
FacetedSearchgptext.faceted_field_search()–search,facetedbyfields.
gptext.faceted_query_search()–search,facetedbyqueries.
gptext.faceted_range_search()–search,facetedbydefinedranges.
WorkingWithTermsgptext.enable_terms()–enablestermvectorsandpositions.
gptext.terms()–getsthetermvectorsfortheindexeddocumentsinaSolrindexforthespecifiedfield.
ConfigurationandMonitoringgptext.cluster_status()-showsstatusofindexesmanagedbytheGPTextcluster.
©CopyrightPivotalSoftware,Inc,2013-2017 60 2.1.2
gptext.index_status()–showsstatusofreplicasforanindexorforallindexes.
gptext.partition_status()-listspartitionedindexesandchildpartitions.
gptext.reload_index()–reloadsSolrconfigurationfiles.
gptext.version()–returnsversionofGPTextinstallation.
HighAvailabilitygptext.add_replica()–Addsareplicaofanindexshard.
gptext.delete_replica()–Deletesareplicaofanindexshard.
GeneralPurposeFunctionsgptext.count_t()–countsnumberofrowsinatable.
PrivilegesYourprivilegestoexecutetheGPTextfunctionsdependonyourGreenplumDatabaseprivilegesforthetablefromwhichtheindexisgenerated.Forexample,ifyouhaveSELECTprivilegesforatableintheGreenplumdatabase,youhaveSELECTprivilegesforanindexgeneratedfromthattable.
ExecutingindexfunctionsrequiresoneofOWNER,SELECT,INSERT,UPDATE,orDELETEprivileges,dependingonthefunction.TheOWNERisthepersonwhocreatedthetableandhasallprivileges.SeetheSecuritysectionoftheGPTextUser’sGuideforinformationaboutsettingprivileges.
ThePrivilegesrequiredsectionforeachoftheGPTextfunctionsspecifiestheprivilegesrequiredtoexecutethatfunction.
UsageThe gptext functionsinthissectionmustbeexecutedasSQLqueriesintheform:
SELECT*FROMgptext.function();
YoumustrunthesequeriesfromtheGreenplumDatabasemaster.
Theexamplesinthissectionuseoneofthefollowing:
AGreenplumdatabasenamed wikipedia setupasfollows:
A public schema.Atablenamed articles .The articles tablehasthefollowingcolumns:id , date_time , title , content , refs .Thedefaultsearchcolumnis content .Thenameoftheindexonthe articles tableis wikipedia.public.articles .
IndexingIndexingfunctionscreate,setup,populate,andfinalize(commit)Solrindexes.
gptext.create_index()CreatesanemptySolrindex.
©CopyrightPivotalSoftware,Inc,2013-2017 61 2.1.2
Syntax
gptext.create_index(schema_name,table_name,id_col_name,def_search_col_name[,if_check_id_uniqueness])
or
gptext.create_index(schema_name,table_name,p_columns,p_types,id_col_name,def_search_col_name[,if_check_id_uniqueness])
Parameters
schema_name
ThenameoftheschemaintheGreenplumdatabase.table_name
ThenameofthetableintheGreenplumdatabase.Ifthetableispartitionedthismustbethenameoftheroottable.p_columns
Atextarraycontainingthenamesofthetablecolumnstoindex.If p_columns and p_types areomitted,alltablecolumnsareindexed.
Thecolumnsmustbevalidcolumnsinthetable.Thecolumnsidentifiedbythe id_col_name and def_search_col_name mustbeincludedinthearray.
Ifthe p_columns parameterissupplied,the p_types parametermustalsobesupplied.Thesizesofthe p_columns and p_types arraysmustbethesame.
p_types
AtextarraycontainingtheSolrdatatypesofthecolumnsinthe p_columns array.
Texttypescanbemappedtothenameofananalyzerchain,forexample text_intl , text_sm ,oranytypedefinedinthe managed_schema .SeeMapGreenplumDatabaseDataTypestoSolrDataTypesforequivalentSolrdatatypesforotherGreenplumtypes.
The p_types parametermustbesuppliedifthe p_columns parameterissupplied.
id_col_name
Thenameofacolumnin table_name thatisuniqueforeachrow.Thecolumnmustbeoftype int4 , int8 , varchar , text ,or uuid .def_search_col_name
Thenameofthedefaultcolumntosearchin table_name ,ifnoothercolumnisnamedinaquery.if_check_id_uniqueness
Optional.ABooleanvalue.Thedefaultistrue.Settofalsetoindexatablewithanon-uniqueIDfield.
Returntype
boolean
Privilegesrequired
OnlytheOWNERcanexecutethisfunction.
Remarks
AGPTextindexisaSolrcollection.
Thecontentsofthe id_col_name columnshould,inmostcases,beauniqueIDforeachrow.Itmustbeoftype int4 , int8 , varchar ,or text .
Ifthe if_check_id_uniqueness argumentistrue,thedefault,adocumentwithanIDmatchinganexistingIDcannotbeaddedtotheindex.
Ifthe if_check_id_uniquess argumentisfalse,documentswithduplicateIDsareallowedtobeaddedtotheindex.ThecontentofotherfieldsmayormaynotbethesameasexistingdocumentswiththesameID.WhenaqueryreturnsmultipledocumentswiththesameID,itistheuser’sresponsibilitytoanticipateandhandlethemultipledocuments.Forexample,atablecouldhavea revision columnthatisincrementedwhenanewversionofadocumentisaddedtotheindex,allowingqueriesthatomitallbutthemostrecentversionfromsearchresults.
©CopyrightPivotalSoftware,Inc,2013-2017 62 2.1.2
Thenameoftheindexcreatedhastheformat:
<database_name>.<schema_name>.<table_name>
Whenthetableispartitioned,theGPTextindexcreatedforthetablewillcontainrecordsforallpartitions.Ifyouspecifythenameofasubpartitiontableinthisfunctionanerrorisreturned.Theindexrecordsfordocumentsaddedtotheindexhavea __partition fieldcontainingthenameofthechildpartitiontable.SeeSearchingPartitionedTablesforsyntaxtosearchbypartitions.
Populatethenewindexwithgptext.index().
Thenumberofreplicasforeachshardisdeterminedwhentheindexiscreated.Itisthevalueofthe gptext.replication_factor serverconfigurationparameter,2bydefault.
Ifthe gptext.failover_factor serverconfigurationparameterisset, gptext.create_index() failsiftheratioofthenumberofGPTextnodesthatareuptothetotalnumberofGPTextnodesislessthanthe gptext.failover_factor value(from0.0to1.0).IndexshardscanonlybecreatedonactiveGPTextnodes,sothegptext.failover_factor parameterpreventsoverloadingtheactiveGPTextnodeswhentoomanynodesaredown.
Toindexapartitionedtable,specifythenameoftheroottable.The gptext.index() functionreturnsanerrorifyouspecifythenameofachildpartitiontable.
ExampleCreateanindex, wikipedia.public.articles ,with content asthedefaultsearchfield. sql=>SELECT*FROMgptext.create_index('public','articles','id','content');
Createanindex, wikipedia.public.articles ,with content asthedefaultsearchfield.Indexthe id , content ,and title fields. sql=>SELECT*FROMgptext.create_index('public','articles','{"id","content","title"}','{"long","text","text"}','id','content')
gptext.enable_terms()Enablestermvectorsandpositionstoallowextractingtermsandtheirpositionsfromfieldsofdatatype text .
Syntax
gptext.enable_terms(index_name,field_name)
Parameters
index_name
Thenameoftheindexforwhichyouwanttoenableterms.field_name
Thenameofthefieldforwhichyouwanttoenableterms.
Returntype
boolean
Privilegesrequired
OnlytheOWNERcanexecutethisfunction.
Remarks
Solrcanmarktermsandtheirpositionsindocumentswhenindexing.Thiscapabilityisdisabledbydefault.Use gptext.enable_terms() toenablethecapability.
©CopyrightPivotalSoftware,Inc,2013-2017 63 2.1.2
Call gptext.enable_terms() foreachfieldwhereyouwanttoenableterms.
Aftercallingthisfunction,youmustindexorre-indexwithgptext.index().
Example
=#SELECT*FROMgptext.enable_terms('wikipedia.public.articles','content');WARNING:Enabletermsforfield:contentofindex:wikipedia.public.articlessuccessfully.Reindexdataneeded.enable_terms--------------t(1row)
SELECT*FROMgptext.index(TABLE(SELECT*FROMarticles),'wikipedia.public.articles');SELECT*FROMgptext.commit_index('wikipedia.public.articles');
gptext.index()Populatesanindexbyindexingdatainatable.
Syntax
gptext.index(TABLE(SELECT*FROMtable_name),index_name)
Parameters
TABLE(SELECT * FROM table_name)
Thetabletobeindexed,withdatatype anytable .index_name
Nameoftheindexthatwascreatedwith gptext.create_index() andistobepopulated.
Returntype
SETOFdbidINT,num_docsBIGINT
where dbid isthe dbid ofthesegmentthatthedocumentsweresentto,and num_docs isthenumberofdocumentsthatwereindexed.
Privilegesrequired
YoumusthavetheINSERTorUPDATEprivilegetoexecutethisfunction.
Remarks
index_name musthavebeencreatedwith gptext.create_index() .
Theargumentstothe gptext.index() functionmustbeexpressions.Forexample,TABLE(SELECT*FROMarticles)createsa“table-valuedexpression”fromthearticlestable,usingthetablefunction TABLE .
Youcanselectivelyindex/updatebychangingtheinnerselectlistinthequery.
Aftersuccessfullyindexing,youmustcommittheindexwith gptext.commit_index(index_name) .
Theoutputincludesatwo-columntablewith dbid (theGreenplumsegmentID)and num_docs (thenumberofdocumentsaddedtotheindexforthatsegment)asthecolumns.
Note:
©CopyrightPivotalSoftware,Inc,2013-2017 64 2.1.2
Becarefulaboutdistributionpolicies:
Thefirstparameterof gptext.index() is TABLE(SELECT*FROMmessages)
.Thequeryinthisparametershouldhavethesamedistributionpolicyasthe
tableyouareindexing.Therearetwocaseswherethequerywillnothavethesamedistributionpolicy:
1. Yourqueryisajoinoftwotables
2. Youareindexinganintermediatetable(stagingtable)thatisdistributeddifferentlythanthefinaltable.
Whenthedistributionpoliciesdiffer,youmustspecify "SCATTERBY"
forthequerylikethis:
TABLE(SELECT*FROMmessagesSCATTERBYdistrib_id) ,
where distrib_id isthedistributionidusedwhenyoucreatedyourprimary/finaltable.
Example
=#SELECT*FROMgptext.index(TABLE(select*FROMarticles),'wikipedia.public.articles');dbid|num_docs------+----------3|62|5(2rows)
gptext.commit_index()Finishesanindexoperation.Theresultsofanindexingoperationarenotavailableuntilthisfunctioniscalledfortheindex.
Syntax
gptext.commit_index(index_name)
Parameters
index_name
Thenameoftheindextocommit.Ifthetableispartitionedthismustbethenameoftheroottable.
Returntype
boolean
Privilegesrequired
YoumusthavetheINSERT,UPDATE,orDELETEprivilegetoexecutethisfunction.
Remarks
Mustbecalledaftergptext.index()andgptext.delete().
Example
©CopyrightPivotalSoftware,Inc,2013-2017 65 2.1.2
=#SELECT*FROMgptext.commit_index('wikipedia.public.articles');commit_index--------------t(1row)
ModifyingorDeletinganIndexYoucanchangeanindexbyaddingordroppingfields,revertinganindextoitspreviousstate,ordeletingtheindex.
gptext.add_field()Addsafieldtoyourschemaifthefieldwasaddedtothedatabaseaftertheindexwascreated.
Syntax
gptext.add_field(index_name,field_name[,is_default_search_col[,if_enable_terms]])
Parameters
index_name
Thenameoftheindextowhichyouwanttoaddthefield.Ifthetableispartitionedthismustbethenameoftheroottable.field_name
Thenameofthefieldtobeindexed.is_default_search_col
Optional.Booleanvalue.Isthistobecomethedefaultsearchcolumn(field)?if_enable_terms
Optional.Booleanvalue.EnabletermssupportonthisfieldwhenaddedtotheGPTextindex.
Returntype
SETOFboolean
Privilegesrequired
OnlytheOWNERcanexecutethisfunction.
Remarks
Callthisfunctionforeachfieldyouadd.
Beforeandafteryouaddoneormorefields,reloadtheSolrconfigurationfileswithgptext.reload_index().Theinitial reload_index() callisrequiredbecauseofSolr4.0behaviorandmaynotberequiredinsubsequentversions.
Afteryouaddoneormorefields,youmustalsocreateandpopulateanewindexwithgptext.create_index() and gptext.index(),thencommitwithindexwithgptext.commit_index().
Example
Addsthefield external_links totheindex,thenrecreates,repopulates,andcommitstheindex.
©CopyrightPivotalSoftware,Inc,2013-2017 66 2.1.2
=#SELECT*FROMgptext.reload_index('wikipedia.public.articles');=#SELECT*FROMgptext.add_field('wikipedia.public.articles','external_links',false,false);INFO:Addfield:external_linksforindex:wikipedia.public.articlesadd_field-----------t(1row)
=#SELECT*FROMgptext.reload_index('wikipedia.public.articles');reload_index--------------t(1row)
SELECT*FROMgptext.commit_index('wikipedia.public.articles');
gptext.delete()Deletesalldocumentsthatmatchthesearchquery.
Syntax
gptext.delete(index_name,query)
Parameters
index_name
Thenameoftheindex.query
Documentsmatchingthisquerywillbedeleted.Todeletealldocumentsusethequery '*' or '*:*' .
Returntype
boolean
Privilegesrequired
YoumusthavetheDELETEprivilegetoexecutethisfunction.
Remarks
Afterasuccessfuldelete,committheindexasfollows.
gptext.commit_index(index_name)
ExamplesDeletealldocumentscontainingtheword“unverified”inthedefaultsearchfield:
=#SELECT*FROMgptext.delete('wikipedia.public.articles','unverified');delete--------t(1row)=#SELECT*FROMgptext.commit_index('wikipedia.public.articles');commit_index--------------t(1row)
©CopyrightPivotalSoftware,Inc,2013-2017 67 2.1.2
Deletealldocumentsfromtheindex:
wikipedia=#SELECT*FROMgptext.delete('wikipedia.public.articles','*:*');delete--------t(1row)
wikipedia=#SELECT*FROMgptext.commit_index('wikipedia.public.articles');commit_index--------------t(1row)
gptext.drop_field()Removesafieldfromyourschema.
Syntax
gptext.drop_field(index_name,field_name)
Parameters
index_name
Thenameoftheindexfromwhichtodropthefield.Ifthetableispartitionedthismustbethenameoftheroottable.field_name
Thenameofthefieldtodrop.
Returntype
boolean
Privilegesrequired
OnlytheOWNERcanexecutethisfunction.
Remarks
Callthisfunctionforeachfieldyoudrop.
Beforeandafterdroppingoneormorefields,youmustreloadtheSolrconfigurationfileswithgptext.reload_index(),thencommittheindexwithgptext.commit_index().
Thecolumn __partition inindexesforpartitioneddatabasetablescannotbedropped.
Theinitial reload_index() isrequiredbySolr4.0behaviorandmaynotbenecessaryinsubsequentversions.
Example
Dropsthefield external_links fromtheindex.
©CopyrightPivotalSoftware,Inc,2013-2017 68 2.1.2
=#SELECT*FROMgptext.reload_index('wikipedia.public.articles');reload_index--------------t(1row)
=#SELECT*FROMgptext.drop_field('wikipedia.public.articles','external_links');INFO:Dropfield:external_linksforindex:wikipedia.public.articlesdrop_field------------t(1row)
=#SELECT*FROMgptext.reload_index('wikipedia.public.articles');reload_index--------------t(1row)
=#SELECT*FROMgptext.commit_index('wikipedia.public.articles');commit_index--------------t(1row)
gptext.drop_index()Removesanindex.
Syntax
gptext.drop_index(index_name)
Parameters
index_name
Thenameoftheindextodrop.Ifthedatabasetableispartitioned,thismustbethenameoftheroottable.
Returntype
boolean
Privilegesrequired
OnlytheOWNERcanexecutethisfunction.
Remarks
Adroppedindexcannotberecovered.
Example
=#SELECT*FROMgptext.drop_index('wikipedia.public.articles');drop_index------------t(1row)
©CopyrightPivotalSoftware,Inc,2013-2017 69 2.1.2
SearchSearchfunctionsenablequeryinganindex.
ChangingthequeryparseratquerytimeWhenusingthesearchfunctions,youcanchangethequeryparserusedbySolratquerytime.Adifferentqueryparsermayberequired,dependingonthenatureofthequery.SeetheUsingAdvancedQueryingOptionsforalistofthequeryparsersGPTextsupports.
Tochangethequeryparseratquerytime,usethe defType Solroptionwiththe gptext.search() function.
Tochangethequeryparserforanysearchfunctionatquerytime,usetheSolr localParams syntax,replacingthe <query> termwith '{!type=edismax}<query>'
.
WiththeGPTextUniversalQueryParser,youcanusefeaturesofanyofthesupportedqueryparsersinonequery.TousetheUniversalQueryParser,replacethe <query> termwith '{!gptextqp}<query>' .SeeUsingtheUniversalQueryParserforinformationandexamples.
gptext.search()Searchesanindex.
Syntax
gptext.search(src_tbl,index_name,search_query,filter_queries)
or:
gptext.search(src_tbl,index_name,search_query,filter_queries[,options])
Parameters
src_table
Specifiesa SELECT statementonanexisting,indexedtableonwhichtoperformthesearch.The src_table parameterisananytabledatatype,specifiedinthefollowingformat
TABLE(SELECT*FROM<src_table>)
index_name
Thenameoftheindextosearch.Ifthedatabasetableispartitionedyoucanspecifythenameofasub-partitiontabletosearch.search_query
TextvaluecontainingaSolrtextsearchquery.filter_queries
Atextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .options
Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.
Returntype
SETOFgptext.search_scored_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
id text
score doubleprecision
©CopyrightPivotalSoftware,Inc,2013-2017 70 2.1.2
hs(conditional) gptexthstore
rf(conditional) textColumn Type
The id columnisreturnedastext,evenifthe id_col specifiedinthe gptext.create_index() functionisanintegertype.Ifyouorderresultsby id orjoinsearchresultswiththeoriginaltableon id ,youmustcastthereturned id columntothecorrectintegertypeinyourquery.Forexample,thefollowingsearchqueryreturnsresultssortednumericallybythe id column:
SELECTid,scoreFROMgptext.search(table(select1scatterby1),'demo.public.test','*:*',null)ORDERBYid::INT8,2;
Ifthe options parameterisincludedin gptext.search() ,theresultincludesthe offsets column.Thiscolumncontainskey-valuepairs,wherethekeyisthecolumnnameandthevalueisacomma-separatedlistofoffsetstolocationswherethesearchtermoccurs.Thisdataisusedbythegptext.highlight()functiontoaddhighlightingtagstothecolumndata.Ifhighlightingisnotenabledwiththe 'hl=true' option,the offsets columnis NULL .
Ifthe fl optionisincludedinthe options parametertospecifyadditionalfieldstoaddtotheresult,the rf columncontainstheadditionalfieldsinaformattedtextvalue.Thegptext.gptext_retrieve_field()functioncanbeusedtoextractasinglefieldvaluefromthe rf column.Therearevariantsofthegptext.gptext_retrieve_field()functiontoretrieveintegerandfloatvaluesfromthe rf columnvalue.
Privilegesrequired
YoumusthavetheSELECTprivilegetoexecutethisfunction.
Solroptions
Solrqueriesallowthefollowingoptionalrefinements,specifiedasanampersand-delimitedlistinthe options parameter.
defType
Thenameofthequeryparsertouseforthisquery.
Example: defType=edismax
rows
Themaximumnumberofrowstoreturnpersegment.Ifomitted,allrowsarereturned.
Example: rows=100 returns100rowspersegmentorallrowsiftherearefewerthan100.
sort
Sortsonafieldorscoreinascendingordescendingorder.
Examples:
score desc (defaultifnosortdefined)
date_time asc
date_time asc score desc sortson date_time ascending,thenon score descending
start
Thenumberofthefirstrecordtoreturn.
Examples:
start=0 default:returnedrecordsstartswithrecord0
start=25 returnedrecordsstartswithrecord25
hl
Enablehighlighting.
Example: hl=true
hl.fl
Comma-separatedlistoffieldnamestoconsiderwhenhighlighting.
Examples:
©CopyrightPivotalSoftware,Inc,2013-2017 71 2.1.2
hl.fl=field1
hl.fl=field1,field2
fl
Comma-separatedlistoffieldstoincludeinsearchresults.Thefieldsmustbesetto stored=true inthemanaged-schemafortheindex.
Example: fl=version,author
RemarksTheoutputincludesatablewithcolumns id (theIDnamedingptext.create_index())and score (the tf-idf score).Acolumnnamed offsets isincludedifhighlightingisspecifiedinthe options parameter.Acolumnnamed rf isincludedwhenalistofadditionalfieldstoincludeisspecifiedinthe options parameter.
Tochangethequeryparseratquerytime,specifythe defType optionintheoptionsparameterlist.Forexample,settingtheoptionsparameterto’rows=100&defType=edismax ’limitstheoutputto100rowspersegmentandwillchangethequeryparserto edismax .
The TABLE queryisplannedandaffectstheestimatefor gptext.search() ,butdoesnotexecute.Forexample,ifyourqueryincludes
gptext.search(TABLE(SELECT*FROMt),...)
thequeryplannerestimatesthenumberofresultsasthenumberofrowsin t .Thiscancausethequeryplannertoignoretheuseofanindexscan.Useaquerylike TABLE(SELECT1SCATTERBY1) toavoidthisissue.
Ifyoudonotspecifyoptions, gptext.search() returnsallrows.
Theoptionsseparatorhaschangedfromcommatoampersand(&)inordertosupporthighlighting.Ifyoudonotusehighlighting,youcanreverttousingthecommaseparatorbysettingthe gptext.search_param_separator to ',' .
TheSolroption rows specifiesthemaximumnumberofrowstoreturnpersegment.Forexample,ifyouhavefoursegments, ‘rows=100' returnsatmostatotalof400rows.Tolimitthenumberofrowsreturnedforanentirequery,seta LIMIT intheSQLquery.Forexample,thefollowingqueryreturnsatmost20rowsforthequery: selectt.id,q.score,t.message\_textfromtwitter.messaget,gptext.search(...)qwheret.id=q.id::int8LIMIT'20';
The gptexthstore typeisalimitedformofthePostgres hstore type.Itonlyhasthe hstore inputandoutputfunctionsimplemented,as gptext_hstore_in
and gptext_hstore_out .
ExamplesRunsaGPTextquerythatlooksforWikipediaarticlesthatcontaintheterm“optimization”,andjoinstheresultstotheoriginalGreenplumDatabasearticles table:
=#SELECTa.id,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT*FROMarticles),'wikipedia.public.articles','optimization',null)qWHEREa.id=q.id::int8ORDERBYscoreDESC;id|title|score----------+----------------+-------------43835553|Cubesort|0.05420493329050607|Cyclesort|0.0442581446508027|Treesort|0.03613662329352|Selectionsort|0.03433720420039|Mergesort|0.0335375933268249|Quicksort|0.03090701825977485|Bubblesort|0.02861433723954341|Timsort|0.02336350877355|Shellsort|0.02312869215205|Insertionsort|0.0228914713995|Heapsort|0.011179198(11rows)
Returns3resultsfromeachshardbeginningatrecord0:
©CopyrightPivotalSoftware,Inc,2013-2017 72 2.1.2
=#SELECTa.id,a.title,q.scoreFROMarticlesa,gptext.search(TABLE(SELECT*FROMarticles),'wikipedia.public.articles','optimization',null,'rows=3&start=0')qWHEREa.id=q.id::int8;id|title|score----------+----------------+-------------13995|Heapsort|0.01117919820039|Mergesort|0.0335375936508027|Treesort|0.03613662315205|Insertionsort|0.0228914777355|Shellsort|0.02312869223954341|Timsort|0.023363508(6rows)
Returns10rowswiththetext“iphone”highlightedinthe message column.
SELECTt.id,gptext.highlight(t.message,'message',s.hs)FROMtwitter.messaget,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message','{!gptextqp}iphone',null,'rows=10&hl=true&hl.fl=message')sWHEREt.id=s.id::int8;
gptext.search_count()Returnsthenumberofdocumentsthatmatchthesearchquery.
Syntax
gptext.search_count(index_name,search_query,filter_queries,options)
Parameters
index_name
Thenameoftheindex.search_query
Thesearchquery.filter_queries
Acomma-delimitedarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .options
Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.
Returntype
bigint
Privilegesrequired
YoumusthavetheSELECTprivilegetoexecutethisfunction.
Remarks
None.
Example
©CopyrightPivotalSoftware,Inc,2013-2017 73 2.1.2
=#SELECT*FROMgptext.search_count('wikipedia.public.articles','bubble',null);count-------4(1row)
gptext.gptext_retrieve_field()Retrievesasinglefieldfromthe rf searchresultcolumnasatextvalue.
Syntax
gptext.gptext_retrieve_field(rf,field_name)
Parameters
rf
Thenameofthe rf column.field_name
Thenameofthefieldtoretrieve.
Remarks
The fl=<field_list> Solrsearchoptionisaddedtothe options parameterofthe gptext.search() functiontorequestadditionalstoredfields.Theadditionalfieldsarereturnedintheresultsinacolumnnamed rf .The rf columnvaluehasaformatlikethefollowing:
column_value{name:"_version"value:"1544714234398507008"}column_value{name:"revision"value:"9.70"}column_value{name:"author"value:"jdough"}
The gptext.gptext_retrieve_field() functionextractsthevalueforasinglespecifiedfieldandreturnsitasatextvalue.Ifthereisnofieldwiththespecifiednameinthe rf column,itreturnsNULL.
Storingadditionalfieldsinanindexrequiresediting managed-schema tospecifythefieldsthatshouldbestored.SeeStoringAdditionalFieldsinanIndexforinstructions.
gptext.gptext_retrieve_field_int()Retrievesasinglefieldfromthe rf searchresultcolumnasanintegervalue.
Syntax
gptext.gptext_retrieve_field_int(rf,field_name)
Parameters
rf
Thenameofthe rf column.field_name
Thenameoftheintegerfieldtoretrieve.
Remarks
©CopyrightPivotalSoftware,Inc,2013-2017 74 2.1.2
The gptext.gptext_retrieve_field_int() functionisthesameasthegptext.gptext_retrieve_fieldfunction,exceptthattheextractedfieldvalueisconvertedtoanintegervalue.
gptext.gptext_retrieve_field_float()Retrievesasinglefieldfromthe rf searchresultcolumnasafloatvalue.
Syntax
gptext.gptext_retrieve_field_float(rf,field_name)
Parameters
rf
Thenameofthe rf column.field_name
Thenameofthefloatfieldtoretrieve.
Remarks
The gptext.gptext_retrieve_field_float() functionisthesameasthegptext.gptext_retrieve_fieldfunction,exceptthattheextractedfieldvalueisconvertedtoafloatvalue.
gptext.highlight()Highlightstermsbyinsertingmarkuptagsintodata.
Syntax
gptext.highlight(column_data,column_name,offsets)
Parameters
column_data
Thetextdatafromthetablewhichistobetaggedwithhighlightingtags.column_name
Thenameofthecorrespondingcolumnfromthetable.offsets
A gptext hstore valuethatcontainskey-valuepairsthatindicatethelocationsofthetexttohighlightwithinthecolumndata.SeeRemarksforinformationaboutthe gptext hstore datatype.
PrequisiteTousehighlighting,termvectorsmustbeenabledbeforecreatingtheindex.Toenabletermvectors,callgptext.enable_terms()foreachfieldwhereyouwanttoenableterms,thenindexorre-indexwithgptext.index().
RemarksThe offsets parameterisa gptexthstore ,whereeachkeyisacolumnnameandthevalueisacomma-separatedlistofoffsetsintothecolumndata.Thishstore isconstructedbygptext.search()withhighlightingenabledinthe offsets parameter.Followingisanexampleofthe offsets hstore content:
©CopyrightPivotalSoftware,Inc,2013-2017 75 2.1.2
"field1"=>"0:5,9:14","field2"=>"13:20"
gptext.highlight() willinserttwosetsoftagsintothe field1 dataandonesetintothe field2 dataattheindicatedoffsets.
The gptexthstore typeisalimitedformofthePostgres hstore type.Ithasonlythe hstore inputandoutputfunctionsimplemented,as gptext_hstore_in
and gptext_hstore_out .
Thehighlighttagsaredefinedbythe gptext.hl_pre_tag and gptext.hl_post_tag serverconfigurationparameters.Theirdefaultvaluesare <em> and</em> ,respectively.
ExampleReturns10rowswiththetext“iphone”highlightedinthe message column.
SELECTt.id,gptext.highlight(t.message'message',s.hs)FROMtwitter.messaget,gptext.search(TABLE(SELECT1SCATTERBY1),'demo.twitter.message','{!gptextqp}iphone',null,'rows=10&hl=true&hl.fl=message')sWHEREt.id=s.id::int8;
FacetedSearchFacetingbreaksupasearchresultintomultiplecategories,showingcountsforeach.
gptext.faceted_field_search()The faceted_field_search() functionbreakssearchresultsintofieldnamecategories.
Syntax
gptext.faceted_field_search(index_name,query,filter_queries,facet_fields,facet_limit,minimum,options)
Parameters
index_name
Thenameoftheindex.query
Querystatement.Use *:* toqueryforallresults.filter_queries
Atextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .facet_fields
Anarrayoffieldnamestofacet.UsePostgreSQLarraynotation.facet_limit
Maximumnumberofresultstobereturnedforeachaggregation(facet).minimum
Minimumnumberofresultsrequiredbeforeanaggregation(facet)willbereturned.Enter0toreturnallfacets.options
Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.
Returntype
SETOFgptext.facet_field_result
Thisisacompositetypewiththefollowingcolumns:
©CopyrightPivotalSoftware,Inc,2013-2017 76 2.1.2
Column Type
field_name text
field_value text
value_count bigint
Privilegesrequired
YoumusthavetheSELECTprivilegetoexecutethisfunction.
Remarks
None.
ExamplesFaceton spam and truncated fields,nolimit,nominimum,alltweets:
SELECT*FROMgptext.faceted_field_search('twitter.public.message','*:*',null,'{spam,truncated}',0,0);
Faceton author_id ,nolimit,withaminimumoffivetweets,alltweets:
SELECT*FROMgptext.faceted_field_search('twitter.public.message','*:*',null,'{author_id}',0,5);
gptext.faceted_query_search()The faceted_query_search() functionbreakssearchresultsintocategoriesdefinedbyqueriesthatyouprovide.
Syntax
gptext.faceted_query_search(index_name,query,filter_queries,facet_queries,options)
Parameters
index_name
Thenameoftheindex.query
Querystatement.Use *:* toqueryforallresults.filter_queries
Atextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .facet_queries
Type:text[].Required.Anarrayoffacetqueries.options
Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.
Returntype
SETOFgptext.facet_query_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
©CopyrightPivotalSoftware,Inc,2013-2017 77 2.1.2
query_text text
value_count bigint
Column Type
Privilegesrequired
YoumusthavetheSELECTprivilegetoexecutethisfunction.
Remarks
None.
Example
Facetonproductpriceranges(0-100,101-200,201-300,300+)ofcameras:
SELECT*FROMgptext.faceted_query_search('store.public.catalog','product_type:camera',null,'{price:[*TO100],price:[101TO200],price:[201TO300],price:[301TO*]}');
gptext.faceted_range_search()The faceted_range_search() functionbreakssearchresultsintorangecategories,withrangesdefinedbythe range_start , range_end ,and range_gapparameters.
Syntax
gptext.faceted_range_search(index_name,query,filter_queries,field_name,range_start,range_end,range_gap,options)
Parameters
index_name
Thenameoftheindex.query
Querystatement.Use *:* toqueryforallresults.filter_queries
Antextarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .field_name
Thenameofthefieldonwhichtofacet.range_start
Beginningoftherange.range_end
Endoftherange.range_gap
Sizeofrangeincrement,atextvalue.options
Anoptionalampersand-delimitedlistofSolrqueryparameters.SeeSolroptions.
Returntype
SETOFgptext.facet_range_result
©CopyrightPivotalSoftware,Inc,2013-2017 78 2.1.2
Thisisacompositetypewiththefollowingcolumns:
Column Type
field_name text
range_value text
value_count bigint
Privilegesrequired
YoumusthavetheSELECTprivilegetoexecutethisfunction.
Remarks
None.
Example
Facetondaterange,starting1yearago,endingnow,everyday:
SELECT*FROMgptext.faceted_range_search('twitter.public.message','*:*',null,'created_at','NOW/YEAR-1YEAR','NOW/YEAR','+1DAY');
WorkingwithTerms
gptext.terms()GetsthetermvectorsfortheindexeddocumentsinaSolrindexforthespecifiedfield.Youcanuse gptext.terms() tocreatetables.
Syntax
gptext.terms(src_table,index_name,field_name,search_query,filter_queries[,options])
Parameters
src_table
An anytable valuethatspecifiesa SELECT statementonanexisting,indexedtableonwhichtoperformthesearch.Specifyintheformat:
TABLE(SELECT*FROM<src_table>;)
index_name
Thenameoftheindextoqueryforterms.field_name
Thenameofthefieldtoqueryforterms.search_query
Aquerythatthefieldmustmatch.filter_queries
Acomma-delimitedarrayoffilterqueries,ifany.Ifnone,setthisparameterto null .options
Anoptional,comma-delimitedlistofSolrqueryparameters.
©CopyrightPivotalSoftware,Inc,2013-2017 79 2.1.2
Returntype
SETOFgptext.term_info
Thisisacompositetypewiththefollowingcolumns:
Column Type
id text
term text
positions integer[]
Privilegesrequired
YoumusthavetheSELECTprivilegetoexecutethisfunction.
RemarksToenableusing gptext.terms() ,executetheGPTextfunctiongptext.enable_terms(),thenreindexwithgptext.index().
The TABLE queryisplannedandaffectstheestimateofof gptext.terms() ,butdoesnotexecute.Forexample,ifyourqueryincludes:
gptext.terms(TABLE(SELECT*FROMt),...)
Thequeryplannerestimatesthenumberofresultsasthenumberofrowsin t .Thiscancausethequeryplannertoignoretheuseofanindexscan.Useaquerylike TABLE(SELECT1SCATTERBY1) toavoidthisissue.
Examples-CreateaTermsTable
CREATETABLEtwitter.termsASSELECT*FROMgptext.terms(TABLE(SELECT*FROMtwitter.message),'demo.twitter.message','message_text','iphone',null);
ConfigurationandMonitoringIndexconfigurationandmonitoringfunctionsenablemanagingindexes,trackingindexstatistics,checkingstatusofindexsegments,andensuringthatindexcontentsarecurrent.
gptext.cluster_status()ShowsSolrclusterstatus.
Syntax
gptext.cluster_status()
ReturnType
SETOFgptext.cluster_status_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
index_name text
©CopyrightPivotalSoftware,Inc,2013-2017 80 2.1.2
max_shards_per_node integer
router text
replication_factor integer
auto_add_replicas boolean
znode_version integer
config_name text
partitioned boolean
Column Type
gptext.index_status()Showsstatusofreplicasforaspecifiedindexorforallindexes.
Syntax
gptext.index_status([index_name])
Parameters
index_name
Thenameoftheindex.Optional.Returnsstatusforallindexesifnoindexisspecified.
ReturnType
SETOFgptext.index_status_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
content_id smallint
index_name text
shard_name text
shard_state text
replica_name text
replica_state text
core text
node_name text
base_url text
is_leader boolean
partitioned boolean
gptext.partition_status()ListsindexesonpartitionedtablesorchildpartitionnamesinthecurrentGreenplumdatabase.
Syntax
gptext.partition_status([index_name])
©CopyrightPivotalSoftware,Inc,2013-2017 81 2.1.2
Parameters
index_name
Optional.Returnspartitionstatusforallindexesifnoindexisspecified.
ReturnType
SETOFgptext.partition_status_result
Thisisacompositetypewiththefollowingcolumns:
Column Type
partition_name text
inherits_name text
level integer
Example
Listpartitionstatusforanindex.
SELECT*FROMgptext.partition_status('demo.public.tweets');partition_name|inherits_name|level------------------------------+--------------------+-------demo.public.tweets_1_prt_1|demo.public.tweets|1demo.public.tweets_1_prt_2|demo.public.tweets|1demo.public.tweets_1_prt_3|demo.public.tweets|1demo.public.tweets_1_prt_4|demo.public.tweets|1demo.public.tweets_1_prt_5|demo.public.tweets|1demo.public.tweets_1_prt_6|demo.public.tweets|1demo.public.tweets_1_prt_7|demo.public.tweets|1demo.public.tweets_1_prt_8|demo.public.tweets|1demo.public.tweets_1_prt_9|demo.public.tweets|1demo.public.tweets_1_prt_10|demo.public.tweets|1...
Remarks
The gptext.partition_status() functioncanonlylisttheindexpartitionsfortablesinthecurrentGreenplumdatabase.
gptext.reload_index()ReloadsSolrconfigurationfilesiftheyhavebeenmodified.
Syntax
gptext.reload_index([index_name])
Parameters
index_name
Optional.Thenameoftheindexforwhichtoreloadtheconfigurationfiles.
Returntype
boolean
©CopyrightPivotalSoftware,Inc,2013-2017 82 2.1.2
Privilegesrequired
OnlytheOWNERcanexecutethisfunction.
Remarks
None.
Example
=#SELECT*FROMgptext.reload_index('wikipedia.public.articles');reload_index--------------t(1row)
gptext.version()ReturnstheversionofyourGPTextinstallation,includingbuildnumber.
Syntax
SELECT*FROMgptext.version()
Parameters
None.
Returntype
text
Privilegesrequired
Youdonotneedanyprivilegestoexecutethisfunction.
Remarks
None.
Example
=#SELECT*FROMgptext.version();version-----------------------------------------GreenplumTextAnalytics2.1.0(main)(1row)
HighAvailability
©CopyrightPivotalSoftware,Inc,2013-2017 83 2.1.2
gptext.add_replica()Addsareplicaofanindexshard.
Syntax
gptext.add_replica(index_name,shard_name[,node_name])
Parameters
index_name
Nameoftheindex.Iftheindexisforapartitioneddatabasetable,thismustbethenameoftheroottable.shard_name
Nameoftheshardtoreplicate.node_name
Nameofthenodewherethereplicaistobeadded.Optional.Ifomitted,SolrCloudchoosesthenode.
Returntype
boolean
Remarks
ThisfunctionisusedbytheGPTextmanagementutility gptext-replicaadd .
Thevalueofthe gptext.replication_factor configurationparameterwhenanindexiscreateddetermineshowmanyreplicasarecreatedforeachshard.InaGreenplumsystem,therearethesamenumberofshardsasthereareGreenplumsegments.Thenumberofreplicascreatedforanewindexisthenumberofsegmentstimesthevalueofthe gptext.replication_factor configurationparameter,2bydefault.ThereplicasaredistributedevenlyamongtheliveGPTextnodes.
Replicasconsumespaceonthehostwheretheyarecreated,sotheyareusuallyonlycreatedtoreplaceareplicathathasfailedorbecomeunavailableortorelocateareplicatoanotherGPTextinstance.Whenaddingreplicas,youshouldmaintainequaldistributionofreplicasamongtheGPTextnodesandavoidplacingmultiplereplicasforthesameshardonthesamehost.
ThetotalnumberofreplicasforanindexthatcanbeplacedoneachGPTextnodeissetwhentheindexiscreated.(InSolr,thisistheMaxShardsPerNodeparameter.)GPTextsetsthislimitbycalculatingthenumberofreplicastocreatepernodeandaddinganadditionalfactor,specifiedinthegptext.extension_factor serverconfigurationparameter.Thisparametercanbesetbetween0and10;thedefaultvalueis2.Sincethelimitissetwhentheindexiscreated,itisrecommendedtosetthe gptext.extension_factor parametertoahighernumbertoallownewreplicastobecreatedwhennecessary.
Example
=#SELECT*FROMgptext.add_replica('wikipedia.public.articles','shard1');success|core_name---------+-------------------------------------------t|wikipedia.public.articles_shard1_replica4(1row)
gptext.delete_replica()Deletesanamedreplicafromthespecifiedindexandshard.
Syntax
gptext.delete_replica(index_name,shard_name,replica_name[,only_if_down])
©CopyrightPivotalSoftware,Inc,2013-2017 84 2.1.2
Parameters
index_name
Nameoftheindex.shard_name
Nameoftheshardthatcontainsthereplicatodelete.replica_name
Nameofthereplicatoremove.only_if_down
Optional.Whentrue,noactionistakenifthereplicaisactive.Defaultisfalse.
Returntype
boolean
Remarks
Usethegptext.index_status()functiontofindthenameofthereplicatodrop.Namesareintheformat core_nodeX ,where X isanumber.
Thisfunctioniscalledfromthe gptext-replicadrop managementutility.
Examples1. Deletethe core_node5 replicaifitisdown.
=#SELECT*FROMgptext.delete_replica('wikipedia.public.articles','shard1','core_node5',true);ERROR:Deletereplicafailed:Attemptedtoremovereplica:wikipedia.public.articles/shard1/core_node5withonlyIfDown='true',butstateis'active'.CONTEXT:SQLfunction"delete_replica"statement1
2. Deletethe core_node5 replicaevenifitisactive.
=#SELECT*FROMgptext.delete_replica('wikipedia.public.articles','shard1','core_node5');success---------t(1row)
GeneralPurposeFunctions
gptext.count_t()Countsthenumberofrecordsinatable.
Syntax
gptext.count_t(table_name)
Parameters
table_name
Nameofthetableforwhichtocountrecords.
Returntype
©CopyrightPivotalSoftware,Inc,2013-2017 85 2.1.2
integer
Privilegesrequired
YouneedSELECTprivilegeson table_name toexecutethisfunction.
Example
=#SELECT*FROMgptext.count_t('wikipedia.public.articles');count_t---------11(1row)
©CopyrightPivotalSoftware,Inc,2013-2017 86 2.1.2
GPTextManagementUtilitiesManagementutilitiesareGPTextcommand-lineutilitiesthatareusedtomanagetheGPTextcluster.TheutilitiesmustberunontheGreenplummasterasthegpadminuser.
ToensuretheGPTextcommand-lineutilitiescanbefoundonthepath,sourcetheGreenplumDatabaseandGPTextenvironmentscripts.TheGreenplumDatabaseenvironmentmustbesetbeforeyousourcetheGPTextenvironmentscript.Forexample,ifbothGreenplumDatabaseandGPTextareinstalledinthe/usr/local/directory,enterthesecommands:
$source/usr/local/greenplum-db-/greenplum_path.sh$source/usr/local/greenplum-text-/greenplum-text_path.sh
HelpTogethelpforautility,specifytheflag -h or --help .Ashorthelpmessagedisplayswithalistofparameters.
DebuggingTogetverboseoutputfordebuggingautility,specifytheflags -v or --verbose .
GPTextUtilitiesgptext-backup–backsupaGPTextindextoasharedfilesystem.
gptext-config–performsGPTextconfigurationoptions.
gptext-expand–addsnewGPTextnodestoexistinghostsinthecluster.
gptext-installsql–installsorremovesthegptextschemaanduser-definedfunctionsinGreenplumdatabases.
gptext-migrator-installstheGPTextbinariesintoanupgradedGreenplumDatabasesystem.
gptext-recover–restartsGPTextnodesthataredown.
gptext-replica–addsordropsareplicaofanindexshard.
gptext-restore–restoresaGPTextindexfromabackuponasharedfilesystem.
gptext-start–startsorrestartstheGPTextcluster.
gptext-state–displaythestateoftheGPTextclusterandindexes.
gptext-stop–shutsdowntheGPTextcluster.
gptext-uninstall–uninstallsGPText,includingdataandinstalledfiles,andZooKeepernodesiftheywereinstalledwiththeGPTextinstaller.
gptext-upgrade-upgradesaGPTextsystemtoanewGPTextversion.
zkManager–checkstheZooKeeperclusterstate.IfZooKeeperwasinstalledwithGPText, zkManager canstartorstoptheZooKeepercluster.
gptext-backupBacksupaGPTextindextoasharedfilesystem.
Syntax
gptext-backup-h
gptext-backup-p<path>-i<index>-n<name>[-v]
Parameters
-h
©CopyrightPivotalSoftware,Inc,2013-2017 87 2.1.2
--help Displaysausagemessageandexits.
-ppath
--pathpath
Thepathwherethesharedfilesystemismountedoneachhost.Thefilesystemmustbeaccessiblefromallhostsintheclusterandreadableandwritablebythegpadminuser.
-iindex_name
--indexindex_name ThenameoftheGPTextindextobackup.
-nbackup_name
--namebackup_name Anameforthebackup.
Example
gptext-backup-imyindex-p/mnt/backupfs/gptext-backups-nmybackup
NotesYoucanbackupanindexsothatyoucanrestoreittoadifferentGPTextsystemoravoidhavingtoreindexiftheexistingindexbecomescorrupted.
ThesharedfilesystemmustbemountedonallhostswithGPTextnodesandmustbewritablebythegpadminuser.Thefilesystemcouldbe,forexample,anNFSmountoraSSHserverwithsshfssupport.Thefilesystemmustbeconfiguredandaccessiblebeforeyouexecutethe gptext-backup utilityandabletoacceptconnectionsfromeachhostinthecluster.
The gptext-backup utilitycreatesanewsubdirectoryatthespecifiedpathwiththebackupnamespecified.Thecommandfailsifthedirectoryalreadyexists.
Whenthebackupiscomplete,thebackupdirectorycontainsthefollowing:
backup.infoAtextfilecontainingthreecomma-separatedstrings:thedatabasename,schemaname,andindexnamefortheindexthatwasbackedup.
backup.propertiesAtextfilewithpropertiesthatdescribethebackup,suchasthedateandtimethebackupstarted,thenameofthebackup,andthenamesoftheSolrcollectionandcollectionconfiguration.
zk_backupAdirectorycontainingthefollowing:
collection_state.json –aJSONfiledescribingthestatusoftheSolrcollection.
configs/<collection-name>/ –adirectorycontainingcopiesoftheSolrconfigurationfilesstoredinZooKeeperfortheindex,forexample managed-
schema , solrconfig.xml , protwords.txt , stopwords.txt .
snapshot.shard0…snapshot.shard_N_Adirectoryforeachshard,withthefilescontainingcontentoftheshard.
Ifthebackupfails—forexampleifthereisinsufficientdiskspace—anerrormessageisdisplayed,butthebackupdirectoryisnotremoved.Besuretoremovethebackupdirectorybeforerestartingthebackup.
gptext-configPerformsGPTextconfigurationtasks:
Edit,add,oruploadconfigurationfilesinZooKeeper
RevertconfiguredfilesinZooKeeper
EditJVMconfigurationoptions
©CopyrightPivotalSoftware,Inc,2013-2017 88 2.1.2
UploadjarfilestotheGPTexthomedirectory
Syntax
gptext-config-h|--help
gptext-config-ffile_name-iindex_name[-r][-e][-b]
gptext-config-aappend_file-ffile_name-iindex_name
gptext-config-upath/local_file_name-fpath/zookeeper_file_name-iindex_name
gptext-config-jpath/jar_file
gptext-config-ojvm_options
gptext-config-ffile_name-iindex_name-afile_to_append
gptext-config-ksolr_key-s<solr_value>
Parameters
-h
--help Displaysausagemessageandexits.
-iindex-name
--index=index-nameNameoftheindex.Iftheindexisforapartitionedtable,youmustspecifytherootpartitionname.
-ffilename
--file=filename
Thenameofafiletoedit,append,orupload.The -i optionmustbeincludedtospecifytheindex.Thefollowingfilesaresupported:
solrconfig.xml –ContainsmostoftheparametersforconfiguringSolritself(seeConfiguringsolrconfig.xml attheApacheSolrwebsite).
schema.xml –DefinestheanalysischainsthatSolrusesforvariousdifferenttypesofsearchfields(seeSettingupTextAnalysisChains).
stopwords.txt –Listswordsyouwanttoeliminatefromthefinalindex.Youcanalsoeditlanguagespecificstopwordsbyspecifyingafilenameintheformat stopwords_language_code.txt ,wherelanguage_code isatwo-charactercodesuchas en , fr ,or es .
protwords.txt –Listsprotectedwordsthatyoudonotwanttobemodifiedbytheanalysischain.Forexample,iPhone.
synonyms.txt –Listswordsthatyouwantreplacedbysynonymsintheanalysischain.
emoticons.txt –Definesemoticonsforthe text_sm socialmediaanalysischain(seegptext-start).
currency.txt –Definesexchangeratesbetweenonecurrencyandanother(seeWorkingwithCurrenciesandExchangeRates attheApacheSolrwebsite).
jar_file–thenameofajarfiletouploadtoGPText_Install_Directory/lib/.
-ecommand
--editor=command
Editortouse.Choicesareanyeditorthattakesafilenameonthecommandlineasaparameter.Forexample,vi,vim,emacs,nano,etc.Ifabsent,viisused.
-cstats-list
--stats_columns=stats-list
Usedwiththe -i or --index option,specifiesacomma-separatedlistofstatisticstodisplay.Thelistmaycontain replication_factor ,max_shards_per_node , num_docs ,and size_in_bytes .Ifno -c or--stats_columns optionissupplied,allfourstatisticsaredisplayed.
©CopyrightPivotalSoftware,Inc,2013-2017 89 2.1.2
-afilename
--append=filename
Appendsanamedfiletoaconfigurationfileanddistributestheresultingfiles.Requiresthe -f and -i parameters. -f namestheconfigurationfiletowhichyouwanttoappendthefilenamed(includinglocalpath)withthe -a parameter.
-r
--revert
filename
Revertnamedfiletopreviousversion.
-bn
--batch_size=nHowmanySolrinstancestoconfigureconcurrently.Thedefault(64)isgenerallymorethanenough.Alargernumbermayincreasespeed.
-ulocal_file_path
--uploadlocal_file_path
Uploadaconfigurationfileatlocal_file_pathtoZooKeeper.SpecifythedestinationZookeeperfilenamewiththe -f optionandspecifytheindexnamewiththe -i option.
-jjarfile
--jar=jarfile UploadsajarfiletoGPText_Install_Directory/lib/.
-o“JVM_Options”ModifiesJVMoptions.ToensurethattheJVMsarerestartedafterchangingJVMoptions,restarttheGPTextclusterusingthe gptext-stopand gptext-start utilities.
-k
--solr_prop=
-s
---solr_val=
Setsacustompropertyinthe solr.xml configurationfile.The-koptionspecifiesthenameoftheproperty.The -s optionspecifiesthevalue.Currently,theonlycustomGPTextpropertyisthe trackCommitproperty,whichenablesordisablescommittracking.
NotesUsethe gptext-config utilitytoedittheconfigurationfilesforaspecifiedindex.
gptext-config automaticallyreindexesaftereditingfilesiftheconfigurationchangesmaderequireit.
Ifyouusethe -f ( --file )parametertoeditoneoftheindexconfigurationfiles,GPTextautomaticallyplacestheeditedfileinitsproperdirectory.
Tomoveanindexconfigurationfilefromthelocalfilesystemtotheindexconfigurationdirectoryinallofthesegments,specifythelocalfilewiththe -u (--upload )optionandthedestinationfilewiththe -f (–file`)option.
The -k ( --solr_prop )and -s ( --solr_val )parametersmustalwaysbeusedtogether.TheysetacustomGPTextpropertyinthe solr.xml configurationfile.Currently,theyareonlyusedtoenableordisablecommittracking.Changesto solr.xml donottakeeffectuntiltheGPTextinstanceisrestarted.
Examples1. Editthe managed-schema fileinindex wikipedia.public.articles ,usingthevieditor:
gptext-config-fmanaged-schema-iwikipedia.public.articles-evi
2. Appendthefile stopwords.add to stopwords.txt inindex wikipedia.public.articles :
gptext-config-astopwords.add-fstopwords.txt-iwikipedia.public.articles
Nevereditthetemplateconfigurationfiles.Ifyoudo,everyindexyoucreateaftereditingthetemplateswillbecreatedwithyourmodifiedversions.Usethe gptext-config utilitytoensurethatyouareeditingtheconfigurationfilesforyourindex,ratherthanthetemplateconfigurationfiles.
©CopyrightPivotalSoftware,Inc,2013-2017 90 2.1.2
3. Revertfile managed-schema inindex wikipedia.public.articles aftereditingit.
gptext-config-fmanaged-schema-iwikipedia.public.articles-r
4. Uploadthelocalfile custom.txt totheZooKeeperfile custom.conf inindex wikipedia.public.articles :
gptext-config-ucustom.txt-fcustom.conf-iwikipedia.public.articles
5. Uploadjarfile text.jar tothe lib directoryintheGPTexthomedirectory:
gptext-config-jtext.jar
6. SetJVMoptions:
gptext-config-o"-Xms256M-Xmx400M"
gptext-expandExpandsaGPTextclusterbyaddingnewGPTextnodestoexistinghostsinaGPTextclusterortohostsaddedbytheGreenplumDatabase gpexpandmanagementutility.ReplicasforindexescreatedafterthenewGPTextnodesareaddedwillbedistributedacrossthenewandexistingnodes.Documentsmustbereindexedtorebalancereplicasonexistinghostsor,afterexpandingtheGreenplumcluster,toredistributetheindextonewshards.
Synax
gptext-expand-h
gptext-expand-e-p<paths>[-d<database>][-v]
gptext-expand-H<new-hosts>[-d<database>][-v]
Parameters
-h
--help Displaysausagemessageandexits.
-e
--existingAddsGPTextnodestoexistinghostsintheGPTextcluster.The`-p`optionmustalsobesuppliedtospecifythedatadirectoriesforthenewnodes.
-p
--expand_paths
SpecifiespathstodirectorieswherethenewGPTextnodes’datadirectoriesaretobecreated.ThesedirectoriesshouldbeparalleltotheGreenplumDatabasesegmentdatadirectories.Ifthereismorethanonedirectory,placetheminacomma-delimitedlist,forexample-p /data1/nodes,/data1/nodes,/data2/nodes .Requiredwhenexpandingonexistinghosts.
-H
--new_hosts
SpecifiesthenewhostsonwhichGPTextistobeinstalled.Placemultiplehostnamesinacomma-delimitedlist,forexample-H host1,host2,host3 .SeeNotesforrequirementsfornewhosts.
-d
--database SpecifiesthenameofadatabasecontainingGPTextschema.
-v
--verbose Displaysdebugoutput.
©CopyrightPivotalSoftware,Inc,2013-2017 91 2.1.2
NotesThe -p and -d optionscannotbeusedtogether.
WhennewhostsareaddedtotheGreenplumDatabasecluster,ensurethatthefollowingGPTextprerequisitesareinstalledbeforerunning gpexpand :
OracleJava1.8Python2.6orgreaterLinux lsof utilityAllhostsintheclustermustbeabletoreachthenewandexistinghosts.
Existingreplicasarenotautomaticallyredistributed.TorebalancereplicasamongtheexpandedGPTextcluster,youmustreindex.
Whenexpandingtonewhosts,youmustreindextoredistributetheindexamongexistingandnewshards.
gptext-installsqlInstallsorremovesthegptextschemaanduser-definedfunctionsindatabases.
Syntax
gptext-installsql-h
gptext-installsql[-c][-v]db_name[db2_name...]
Parameters
-c
--clean RemovesthegptextschemaandUDFsfromthespecifieddatabases.
-h
--helpDisplaysausagemessageandexits.
-v
--verbose Displaysdebugoutput.
NotesThe gptext schemaisreservedforusebyGPText.The gptext-installsql utilitydropsandrecreatestheschema.IfyouaddanydatabaseobjectstotheschematheywillbelostwhenyoureinstalltheschemaorupgradetheGPTextsystem.
Examples1. InstallGPTextUDFsindatabases wikipedia and twitter .
$gptext-installsqlwikipediatwitter20160603:14:03:53:305130gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Creating'gptext'schemaandUDFsindatabasewikipedia...20160603:14:03:53:305130gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Creating'gptext'schemaandUDFsindatabasetwitter...20160603:14:03:54:305130gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Validatinggptextinstallation20160603:14:03:59:305130gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Done.
2. DeleteGPTextUDFsindatabase wikipedia .
©CopyrightPivotalSoftware,Inc,2013-2017 92 2.1.2
$gptext-installsql-cwikipedia20160603:14:03:04:304847gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Connectingtodatabasewikipedia20160603:14:03:05:304847gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Dropping'gptext'schemaandUDFs...20160603:14:03:05:304847gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Validatingcleanoperation20160603:14:03:09:304847gptext-installsql:gpdb-sandbox:gpadmin-[INFO]:-Done.
gptext-migratorMigratesthecurrentGPTextsystemintoanupgradedGreenplumDatabasecluster.
Syntax
gptext-migrator[-h|--help]
gptext-migrator[-v|--verbose]
NotesThe gptext-migrator utilityrelocatesthecurrentGPTextsystemtoanewGreenplumDatabaserelease.
TheutilitydeterminesthedestinationGreenplumDatabasereleasefromtheenvironment.IftheGPTextsystemhasalreadybeenmigrated,orifthedestinationGreenplumreleaseisunsupported, gptext-migrator outputsamessageandquits.
IfyouareupgradingGPTextandGreenplumDatabaseatthesametime,completetheGreenplumDatabaseupgradefirst,andthenuse gtext-migrator toaddthecurrentGPTextversiontothenewGreenplumDatabaseinstallation.Finally,use gptext-upgrade toupgradethesystemtothenewGPTextversion.
gptext-recoverRecoversGPTextnodes.
Syntax
gptext-recover-h
gptext-recover-f[-v]
gptext-recover-H<new-host1>,<new-host2>,...[-v]
gptext-recover-r[-v]
Parameters
-h
--help Displaysausagemessageandexits.
-f
--force
ForcesrecoveryforanyGPTextnodesthataredown.Ifthenodeisunrecoverable,deletesthenode,createsanewnode,andrecreatesreplicas.
-H
--new-hostsRecoverdownnodesonnewhosts.Forexample“host1,host2”.Thenumberofnewhostsmustbeequaltothenumberoffailedhosts.
-r
©CopyrightPivotalSoftware,Inc,2013-2017 93 2.1.2
--index_replicas Recoverreplicas,butdonotrecoveranydownnodes.
-v
--verbose Displaysdebugoutput.
NotesThe -f and -H optionscannotbeusedatthesametime.
Ifshardsaredown, gptext-recover advisesyoutoreindex.
Ifnoshardsaredown, gptext-recover restoresanyreplicasthataredown.
IfanyGPTextnodesrecoveredusingthe -f or -H optionsfailtostart,thereplicascannotberecovered.Ifthisshouldhappen,resolvethestartupproblemwiththenewlycreatednodes,andthenrecoverthereplicasusingthe gptext-recover-
roption.ItisimportanttorecoverreplicaswhenallGPText
nodesarehealthysothatreplicaswillbedistributedevenlyamongthenodes.
gptext-replicaAddordeleteareplicaforanindexshard.
Syntax
gptext-replica-h
gptext-replicaadd-iindex-name-sshard[-nnode]
gptext-replicadrop-iindex-name-sshard-rreplica[-o]
Parameters
-h
--help Displaysausagemessageandexits.
-iindex-name
--index=index-name Nameoftheindex.
-ffilename
--file=filename
Thenameofafiletoedit,append,orupload.The -i optionmustbeincludedtospecifytheindex.Thefollowingfilesaresupported:
solrconfig.xml –ContainsmostoftheparametersforconfiguringSolritself(see[https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml](http://wiki.apache.org/solr/SolrConfigXml)).
schema.xml –DefinestheanalysischainsthatSolrusesforvariousdifferenttypesofsearchfields(seeSettingupTextAnalysisChains).
stopwords.txt –Listswordsyouwanttoeliminatefromthefinalindex.Youcanalsoeditlanguagespecificstopwordsbyspecifyingafilenameintheformat stopwords_language_code.txt ,where language_code isatwo-charactercodesuchas en , fr ,or es .
protwords.txt –Listsprotectedwordsthatyoudonotwanttobemodifiedbytheanalysischain.Forexample,iPhone.
synonyms.txt –Listswordsthatyouwantreplacedbysynonymsintheanalysischain.
©CopyrightPivotalSoftware,Inc,2013-2017 94 2.1.2
emoticons.txt –Definesemoticonsforthe text_sm socialmediaanalysischain.Seegptext-start.
currency.txt –Definesexchangeratesbetweenonecurrencyandanother(seeWorkingwithCurrenciesandExchangeRates attheApacheSolrwebsite).
jar_file–thenameofajarfiletouploadtoGPText_Install_Directory/lib/.
-ecommand
--editor=commandEditortouse.Choicesareanyeditorthattakesafilenameonthecommandlineasaparameter.Forexample,vi,vim,emacs,nano,etc.Ifabsent,viisused.
-afilename
--append=filename
Appendsanamedfiletoaconfigurationfileanddistributestheresultingfiles.Requiresthe -f and -i parameters. -f namestheconfigurationfiletowhichyouwanttoappendthefilenamed(includinglocalpath)withthe -aparameter.
-r
--revert
filename
Revertnamedfiletopreviousversion.
-iindex
--index=index Required.Thenameoftheindex.
-sshard
--shard=shard Required.Thenameoftheshardtoaddareplicato.
-nnode
--node=node Optional.Thenodewherethereplicaistobeadded.
-rreplica
--replica=replica Requiredforthedropcommandonly.Thenameofthereplicatodrop.
-o
--onlyifdown Optional.Usedonlywiththedropcommand.Onlydropthereplicaifit’sdown.
NotesTofindthenameofareplicatodrop,check gptext.index_status() .Thenameis core_nodeX whereXisanumber.
Examples1. Addareplicaforindex wikipedia.public.articles inshard shard0 ,onnode node1 .
gptext-replicaadd-iwikipedia.public.articles-sshard0-nnode1
2. Dropthereplicanamed core_node1 forindex wikipedia.public.articles inshard shard0 ifthereplicaisdown.
gptext-replicadrop-iwikipedia.public.articles-sshard0-rcore_node3-o
gptext-restoreRestoreaGPTextindexfromabackupcreatedonasharedfilesystemwiththe gptext-backup utility.
©CopyrightPivotalSoftware,Inc,2013-2017 95 2.1.2
Syntax
gptext-restore-h
gptext-restore-i<index_name>-p<path>[-v]
Parameters
-h
--help Displaysausagemessageandexits.
-ppath
--pathpath Thepathtothebackupdirectoryoneachhost.
-iindex_name
--indexindex_nameThenameoftheGPTextindextorestore.TheindexmustnotalreadyexistinthetargetGPTextsystem.
Example
gptext-restore-imyindex-p/mnt/backupfs/gptext-backups/mybackup
NotesUsethe gptext-restore utilitytorestoreaGPTextindexbackupfromasharedfilesystem.YoucanrestorethebackuptoanewGPTextsystemorrestoreabackuptorecoveracorruptedGPTextindex.
Theindexyouarerestoringmustnotexist.Ifyouarerestoringanindextorecoveracorruptedindex,youmustfirstdeletetheexistingindexwiththegptext.delete() UDF.The gptext-restore utilitycreatesanewindexandwilloutputanerrorandquitiftheindexyouarerestoringexists.
gptext-startStartsorrestartstheGPTextcluster.
Syntax
gptext-start-h
gptext-start[-r][-s][-v]
Parameters
-h
--help Displaysausagemessageandexits.
-r
--restart RestartstheGPTextcluster.
-s
©CopyrightPivotalSoftware,Inc,2013-2017 96 2.1.2
--slow_start RestartstheGPTextclusterbystartingindexesoneatatime.
-v
--verbose Displaysdebugoutput.
NotesThe gptext-start-
rcommandcallsthe solr
restartcommandtostopandrestartalloftheSolrinstancesinthecluster.TheGPTextutilitydeterminesifthe
processesarerunningbeforeitcompletes,butitcannotverifythatalloftheSolrprocesseswerestopped.IfitisimportanttobecertainthatSolrprocesseswerestopped,forexampleifyouhavechangedtheJVMoptions,use gptext-stop followedby gptext-start insteadof gptext-start-
r.
The -s ( --slow-start )optionisrecommendedifyouhavealargenumberofindexes.Bydefault,whenaSolrclusterstartsallofthecluster’sindexesarestartedatonce.Withalargenumberofindexes,thenumberofinitialZooKeeperrequestscanresultintimeouterrorsandpossiblefailuretostarttheclustersuccessfully.Withthe -s option,GPTextperformsarollingstart,startingindexesoneatatime,toreduceZooKeepercontentionandallowamorestablestartup.Ifyouhavemorethan50indexesanddonotspecifythe -s option, gptext-start displaysawarningmessageandrequiresyoutoconfirm.Withthe -s option, gptext-start doesnotreturnuntilallindexeshavebeenstarted;withoutthe -s option,the gptext-start commandreturnsimmediately.
Examples1. StarttheGPTextcluster.
gptext-start
2. RestarttheGPTextcluster.
gptext-start-r
gptext-stateDisplaysthestateoftheGPTextclusterandGPTextindexes.
Syntax
gptext-state-h
gptext-state[-d<db-name>][-D][-v]
gptext-state-i<index-name>[-d<db-name>][-c<col1,...>][-v]
gptext-statelist[-d<db-name>][-v]
gptext-statehealthcheck[-d<db-name>][-f<percent>][-v]
Parameters
-h
--help Displaysausagemessageandexits.
-ddb-name
--database=db-name
ThenameofadatabasecontainingtheGPTextschema.
gptext-state searchesalldatabasesforthefunctionsitneedstorun.Iftheuserdoesnothaveaccesspermissiontothedatabaseitbeginswith,it
©CopyrightPivotalSoftware,Inc,2013-2017 97 2.1.2
fails.Inthiscase,usethe --database= parametertospecifyanaccessibledatabasetosearch.
-D
--detailsListthestatusforeachGPTextindex.Whenomitted, gptext-state listscountsofthenumbersofindexeswithGreen,Yellow,andRedstatuses.
-iindex-name
--index=index-name
Thenameofanindex.Displaysstatisticsforthespecifiedindex.Iftheindex-nameisarootorchildpartition,displaysanyparentorchildpartitions.Thisoptioncannotbeusedwiththe list or healthchecksubcommands.
-ccolumn-list
--stats_columns=
Usedwiththe -i or --index option,specifiesacomma-separatedlistofstatisticstodisplay.Thelistmaycontain replication_factor ,max_shards_per_node , num_docs ,and size_in_bytes .Ifno -c or--stats_columns optionissupplied,allfourstatisticsaredisplayed.
-fdiskfree
--disk_free=diskfreeUsedwiththe healthcheck command,specifiesthepercentagediskfreerequiredperhosttoreportahealthyGPTextcluster.Thedefaultis10.
NotesAllparametersareoptional,exceptthat -i ( --index )isrequiredwhenyouspecify --c ( --stats_columns ).
Ifyouspecifyasubpartitionnamewiththe -i option, gptext-state displaysthenameoftheparenttableorpartitionfromwhichthepartitioninherits.Ifyouspecifythenameofatableorpartitionwithchildpartitions, gptext-state liststhem.
Whenexecutedwithnoarguments, gptext-state displaystheGPTextversionandcountsofindexesintheGreen,Yellow,andRedstates.
AGreenstatemeansthatallshardsandreplicasarehealthy.
AYellowstatemeansthatallshardsareavailable,butoneormorereplicasisdown.
ARedstatemeansthatonemoremoreshardsisdown.
Withthe -D ( --details )optionspecified, gptext-state listsallGPTextindexeswiththecolumns database , index_name ,and state .The state columndisplaysthestatusoftheindexas Green , Yellow ,or Red .
IfanyindexhasaYelloworRedstatus, gptext-state returnsanon-zerovalue.
Examples1. ShowtheGPTextclusterstate.
$gptext-state20161216:14:01:32:029224gptext-state:gpsne:gpadmin-[INFO]:-Checkzookeeperclusterstate...20161216:14:01:32:029224gptext-state:gpsne:gpadmin-[INFO]:-CheckGPTextclusterstatus...20161216:14:01:33:029224gptext-state:gpsne:gpadmin-[INFO]:-CurrentGPTextVersion:2.0.020161216:14:01:33:029224gptext-state:gpsne:gpadmin-[INFO]:-Allnodesareupandrunning.20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:-Indexstate.20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:-stateindexcount20161216:14:01:34:029224gptext-state:gpsne:gpadmin-[INFO]:-Green4
2. ShowtheGPTextclusterstatewithdetails,specifying wikipedia asadatabasecontainingtheGPTextschema.
$gptext-state-D-dwikipedia20160603:13:54:27:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextclusterstatus...20160603:13:54:27:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CurrentGPTextVersion:2.0.020160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Allnodesareupandrunning.20160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:------------------------------------------------20160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Indexstate.20160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:------------------------------------------------20160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-databaseindex_namestate20160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-wikipediawikipedia.public.articlesGreen20160603:13:54:28:302662gptext-state:gpdb-sandbox:gpadmin-[INFO]:-gptextdocgptextdoc.public.docsGreen
©CopyrightPivotalSoftware,Inc,2013-2017 98 2.1.2
3. Show replication_factor and num_docs statisticsfortheGPTextindex wikipedia.public.articles .Specify wikipedia asthedatabasewiththeGPTextschema.
$gptext-state-iwikipedia.public.articles-creplication_factor,num_docs-dwikipedia20160603:13:57:16:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextclusterstatistics...20160603:13:57:18:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:-ReplicasUp:620160603:13:57:18:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:------------------------------------------------20160603:13:57:18:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Indexwikipedia.public.articlesstatistics.20160603:13:57:18:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:------------------------------------------------20160603:13:57:18:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:-replication_factornum_docs20160603:13:57:18:303262gptext-state:gpdb-sandbox:gpadmin-[INFO]:-311
4. Listallindexes.
$gptext-statelist20160603:13:58:10:303550gptext-state:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------------20160603:13:58:10:303550gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Indexlist20160603:13:58:10:303550gptext-state:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------------20160603:13:58:10:303550gptext-state:gpdb-sandbox:gpadmin-[INFO]:-gptextdoc.public.docs20160603:13:58:10:303550gptext-state:gpdb-sandbox:gpadmin-[INFO]:-wikipedia.public.articles
5. Performahealthcheckwitha20%freediskrequirement.
$gptext-statehealthcheck-f20-dwikipedia20160603:13:58:56:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-ExecutehealthcheckonGPTextcluster!20160603:13:58:56:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextconfigfiles...20160603:13:58:56:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160603:13:58:56:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextindexstatus...20160603:13:58:56:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160603:13:58:56:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforrequireddiskspace...20160603:13:58:57:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160603:13:58:57:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforrequireduserprivileges...20160603:13:58:57:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160603:13:58:57:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforindexesanddatabaseconsistency...20160603:13:58:58:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160603:13:58:58:303655gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Done.
6. Checkthestatusofapartitionedtable.
$gptext-state-idemo.public.tweets20170427:07:22:45:014741gptext-state:mdw:gpadmin-[INFO]:-ExecuteGPTextstate...20170427:07:22:45:014741gptext-state:mdw:gpadmin-[INFO]:-Checkzookeeperclusterstate...20170427:07:22:45:014741gptext-state:mdw:gpadmin-[INFO]:-CheckGPTextclusterstatistics...20170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-ReplicasUp:420170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:------------------------------------------------20170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-Indexdemo.public.tweetsstatistics.20170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:------------------------------------------------20170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-replication_factormax_shards_per_nodenum_docssizeinbytes20170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-2430794448507520170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-Childpartitionindexes:20170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_120170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_220170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_320170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_420170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_520170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_620170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_720170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_820170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_920170427:07:22:46:014741gptext-state:mdw:gpadmin-[INFO]:-demo.public.tweets_1_prt_10...
gptext-stopStoptheGPTextclusternodes.
Syntax
©CopyrightPivotalSoftware,Inc,2013-2017 99 2.1.2
gptext-stop-h
gptext-stop[-v][-f]
Parameters
-h
--help Displaysausagemessageandexits.
-v
--verbose Displaysdebugoutput.
-f
--force ForcefullystopsallSolrprocesses.
Examples1. StoptheGPTextcluster.
gptext-stop
2. ForcestoptheGPTextcluster.
gptext-stop-f```
gptext-uninstallUninstallsGPText,includingdataandinstalledfiles.UninstallsZooKeepernodesiftheywereinstalledwiththeGPTextinstaller.
StopsanyrunningGPTextinstances.
DeletesallSolrdirectoriesinsegmentdirectories.
Deletestheinstallationdirectory.
RemovesallGPTextschemasandindexesfromalldatabases.
UninstallsZooKeeperifitwasinstalledwiththeGPTextinstaller.
Syntax
gptext-uninstall-h|--help
gptext-uninstall[-v|--verbose]
Parameters
-h
--help Displaysausagemessageandexits.
-r
--restart RestartstheGPTextcluster.
©CopyrightPivotalSoftware,Inc,2013-2017 100 2.1.2
-v
--verbose Displaysdebugoutput.
NotesTouse gptext-uninstall ,youmusthavesuperuserpermissionsonalldatabaseswithGPTextschemas.
gptext-uninstall runsonlyifthereisatleastonedatabasewithaGPTextschema.
Examples1. UninstallGPText.
gptext-uninstall
gptext-upgradeUpgradesthecurrentGPTextsystemtoanewGPTextrelease.
Syntax
gptext-upgrade[-h|--help]
gptext-upgrade[-f<upgrade_file>|--file=<upgrade_file>][-c|--base_check][-v|--verbose]
Parameter Description
-h
--help Displaysausagemessageandexits.
-fupgrade_file
--fileupgrade_file
Providesthepathtotheupgradefile.Thedefaultupgradefileis $GPPERFMONHOME/share/upgrade.yaml .
-c
--base_check
Bydefault, gptext-upgrade checksthattheGPTextenvironmentcanbeupgradedandreportsanyitemsthatmustbecorrectedbeforeupgrading.Whenthe -c or --base-check optionissupplied,theenvironmentcheckisomitted.
-v
--verbose Displaysdebugoutputwhenexecutingthecommand.
NotesTheupgrade_fileisaYAML-formattedscriptdefiningactionstoupgradeaGPTextsystemfromapreviousreleasetothecurrentrelease.Thefileisnotintendedtobeeditedbyusers.Iftheupgrade_filedoesnotcontainsupportforthepreviousGPTextrelease, gptext-upgrade outputsanerrormessageandexits.
©CopyrightPivotalSoftware,Inc,2013-2017 101 2.1.2
zkManagerCheckstheZooKeeperclusterstate.IfZooKeeperwasinstalledwithGPText, zkManager canstartorstoptheZooKeepercluster.
Syntax
zkManager[-h|--help]
zkManagerstate[-v|--verbose]
zkManagerstart[-v|--verbose]
zkManagerstop[-v|--verbose][-f|--force]
Parameters
-h
--help Displayausagemessageandquit.
-f
--force Whenusedwiththe stop command,performsaforcedstop.
-v
--verbose Displaysdebugoutputwhenexecutingthecommand.
NotesThe zkManager start and zkManager stop commandsareonlyavailableiftheZooKeeperclusterwasinstalledbytheGPTextinstaller.
Bydefault,all gptext-* utilitieschecktheZooKeeperclusterstate.Iftheclusterisnothealthy,theZooKeeperstateinformationisdisplayedtowarntheuser.
The nc (netcat)commandmustbeinstalledonthemasterhost.Run nc inaterminaltoensurethecommandisinstalled.
Examples1. StarttheZooKeepercluster,ifZooKeeperwasinstalledbytheGPTextbinary:
zkManagerstart
2. StoptheZooKeepercluster,ifZooKeeperwasinstalledbytheGPTextbinary:
zkManagerstop
3. ForcestoptheZooKeepercluster,ifZooKeeperwasinstalledbytheGPTextbinary:
zkManagerstop-f
4. CheckthestateoftheZooKeepercluster:
$zkManagerstate20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-Executezookeeperstateprocess.20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-HostportLatencymin/avg/maxMode20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-gpdb-sandbox.localdomain21880/0/17follower20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-gpdb-sandbox.localdomain21890/0/17leader20160603:14:17:01:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-gpdb-sandbox.localdomain21900/0/70follower20160603:14:17:06:307386zkManager:gpdb-sandbox:gpadmin-[INFO]:-Done.
©CopyrightPivotalSoftware,Inc,2013-2017 102 2.1.2
©CopyrightPivotalSoftware,Inc,2013-2017 103 2.1.2
GPTextandSolrDataTypeMappingsThefollowingtablemapsGreenplumDatabasedatatypestoSolrdatatypes.
IfaGreenplumDatabasedatatypeisnotlisted,itisa text typeinSolr.
IfaGreenplumDatabasedatatypeisanarrayitismappedtoamulti-valuetypeinSolr.Forexample, INT[]
mapstoamulti-value int Solrfield.
GreenplumDatabaseType SolrType
bit string
bool boolean
bytea binary
char string
date tdate
float4 float
float8 double
int int
int2 int
int4 int
int8 long
interval string
money string
name string
numeric double
point point
text text
time string
timestamp tdate
timestamptz tdate
timetz string
uuid uuid
varbit string
varchar text
©CopyrightPivotalSoftware,Inc,2013-2017 104 2.1.2
GPTextConfigurationParametersGPTextconfigurationparameterscanbeoverriddenbysettinganewvalueinaGreenplumDatabasesession.ChangesmadetoconfigurationparametersonlyaffectfutureGPTextoperations;existingindexesusetheparametervaluesthatweresetwhentheywerecreated.
SeeChangingGPTextServerConfigurationParametersforinformationaboutchangingconfigurationparametersandexamples.
ThefollowingtableliststheGPTextconfigurationparameterswiththeirdefaultsandvalueconstraints.
admin_timeout Timeout,inseconds,foradminrequests(create_index,etc.). 30 INT_MAX 3600
commit_timeout Timeout,inseconds,forpreparecommitandcommitoperations. 30 INT_MAX 3600
delete_timeout Timeout,inseconds,fordeleterequests. 30 INT_MAX 3600
extension_factorMaximumnumberofreplicasthatcanbeaddedforanindexperGPTextnodeaftertheindexiscreated.
0 10 2
facet_timeout Timeout,inseconds,forfacetingqueries. 30 INT_MAX 3600
failover_factorMinimumratioofSolrnodesthatmustbeupinordertocreateanewindex.(SolrNodesUp/TotalSolrNodes )
0.0 1.0 0.8
hl_post_tag Markupthat gptext-highlight() insertsaftertermsinsearchresults. '</em>'
hl_pre_tag Markupthat gptext-highlight() insertsbeforetermsinsearchresults. '<em>'
idx_buffer_size Sizeofindexingbufferinbytes. 4096 67108864 134217728
idx_delim Delimitertouseduringindexing.
comma
','
idx_encapsulatorThecharacteroptionallyusedtosurroundvaluestopreservecharacterssuchastheCSVseparatororwhitespace.
quote
'"'
idx_escape Escapecharactertouseforindexing.
backslash
'\\'
index_timeout Timeout,inseconds,forreceivingresponsetoindexingoperation. 30 INT_MAX 3600
optimize_timeout Timeout,inseconds,foroptimizeoperations. 30 INT_MAX 3600
ping_timeout Timeout,inseconds,forpingrequests. 30 INT_MAX 120
replication_factor Thenumberofreplicaspershardforanewlycreatedindex. 0 10 2
replication_timeout Timeout,inseconds,forreplicationoperations(backup,restore). 30 INT_MAX 43200
rollback_timeout Timeout,inseconds,forrollbackoperations. 30 INT_MAX 3600
search_batch_size Batchsizeforsearchrequests. 1 INT_MAX 2500000
search_buffer_size Buffersizeforsearchresults,inbytes. 4096 67108864 16777216
search_param_separator Delimitertouseinthe options parameterofthe gptext.search() UDF. '&'
search_post_buffer_size Postbuffersizeforsearch,inbytes. 512 4194304 4096
search_timeout Timeout,inseconds,forsearches. 30 INT_MAX 600
stats_timeout Timeout,inseconds,forobtainingstatistics. 30 INT_MAX 600
terms_batch_size Batchsizefortermsoperations. 1 INT_MAX 1000
©CopyrightPivotalSoftware,Inc,2013-2017 105 2.1.2