my document - collibra...gatewayforauthentication 20 gatewayintegrationwithworkflowprocesses 20...

100
1.4.1

Upload: others

Post on 26-Apr-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

  • 1.4.1

  • Contents

    Introducing Collibra Connect 1

    About Collibra Connect 2

    Collibra Connect deployment 2

    Installing Collibra Connect 5

    Integrations with Collibra Connect: from development toproduction 7

    Installing a development environment 8

    System requirements - Anypoint Studio 8

    Hardware requirements 8

    Java Runtime Environment 8

    Operating Systems 8

    Install Anypoint Studio 9

    Install the Collibra DGC Connector in Anypoint Studio 9

    Prerequisites 9

    Steps 9

    Import the Collibra domain in Anypoint Studio 10

    Prerequisites 10

    Steps 10

    Result 10

  • Installing a test and production environment 11

    System requirements - Mule ESB runtime 11

    Hardware requirements 11

    Java Runtime Environments 11

    Operating Systems 11

    Install Mule ESB Server runtime 12

    Prerequisites 12

    Steps 12

    Run Collibra Connect in Mule ESB 12

    Prerequisites 13

    Steps 13

    Collibra domain deployments in Mule ESBStandalone server 13

    Install the Mule Management Console 13

    Prerequisites 14

    Steps 14

    Controlling a Mule ESB Server runtime instance 14

    Linux 14

    Windows 14

    About the Collibra Connect components 17

    Collibra domain application 18

    Collibra domain purpose 18

    Collibra domain configuration 19

    About the gateway application 19

    Gateway as single point of entry 19

  • Gateway for authentication 20

    Gateway integration with workflow processes 20

    Collibra DGC Connector 21

    About upserting assets 21

    About upserting assets by ID 21

    Upserting assets by external ID 22

    About upserting assets by name 23

    Upserting assets by name 23

    Configure the Collibra DGC Connector connection 24

    Configure the parameters for upsert by external ID 25

    Configure the parameters for upsert by name 26

    About the payload of an upsert operation 27

    UUIDs 27

    Related assets 27

    About assigning responsibilities 28

    About the upsert parameters 29

    General properties 29

    DataSense Explorer 30

    Fixed properties 31

    Dynamic properties 33

    Configuring Collibra Connect to connect to a proxiedCollibra DGC 33

    Enable debug logging 34

    Override HTTP PUT and DELETE methods withHTTP POST 35

  • Running integrations in Collibra Connect 37

    Import an integration template 39

    Prerequisites 39

    Steps 39

    Run integrations with Collibra Connect in AnypointStudio 39

    Steps 40

    Run integrations with Collibra Connect in Mule ESBStandalone server 41

    Triggering an integration 42

    Manually start an integration using the Collibradomain and gateway 42

    Manually start an integration: options 43

    Starting an integration without the Collibra domainand gateway 43

    Setting the runtime environment of integration templates44

    Check the Mule ESB runtime version 45

    Change the embedded Mule runtime 46

    Install a new Mule runtime version 46

    Steps 46

    Result 47

    Developing a custom integration 49

    University development course 51

    Designing an integration template 51

    Overview use case 51

    Overview possible scenarios 52

  • Configuring integration templates 57

    Importing CSV data 57

    Upserting assets 58

    Enriching assets with mapping 59

    Generate natural key 60

    Enrich with mapping 60

    Converting data to CSV 62

    Importing CSV data in Collibra DGC 63

    Table view configuration 64

    Learn about the Collibra Connect components 66

    Use gateway with a custom integration template 66

    Prerequisites 66

    Steps 66

    Result 69

    About the gateway entry and end point 69

    Entry point 69

    End point 69

    Connecting to the gateway from a workflow 70

    Create a connection to the gateway 70

    Connect to the gateway from a workflow 70

    Sending information to the workflow 71

    Processing a message in a workflow 72

    Scheduling the trigger of an integration 73

    Configuring the trigger interval 73

  • Configuring the integration template to start 74

    Configuring the payload 75

    Starting the integration template 75

    Edit Collibra integration templates in Anypoint Studio 75

    Collibra Marketplace 77

    Frequently Asked Questions 78

    Do I need a license for Collibra Connect? 78

    What are the current versions of Collibra Connect? 78

    Are there trainings provided? 78

    Is Collibra Connect Installed on the Same Server ofCollibra DGC? 78

    What is the application server Collibra Connect runson? 78

    How does Collibra Connect communicate with SaaSSystems such as Workday, ServiceNow andSalesforce.com ...? 79

    How does Collibra Connect communicate with on-premise Systems such as SAP and Oracle Financialetc.? 79

    Is there any Intrusion Detection Mechanism in place? 79

    How is patching performed? Is there Auto Update? 79

    What is the rollback process? 80

    How about authentication, authorization, accesscontrol and data security? 80

    What do I need to do when one of the applications I'mintegrating with requires a secure (SSL) connection toself-signed certificate? 81

  • How do I check which version of the CollibraDGC Connector is used? 82

    What if I need to go through a proxy server in order toconnect to Collibra DGC from Collibra Connect? 83

    How can I know which REST calls are made by theCollibra DGC Connector during execution of givenintegration flow? 83

    How can I set verbose exception stacktrace? 84

    How can I see the actual HTTP call message data? 84

    Glossary 86

    Index 91

  • - 1 -

    Introducing Collibra ConnectCollibra Connect offers you a way to integrate your own application withCollibra Data Governance Center to enable active data governance.

    Tip Collibra DGC Connector 1.4.1 supports Collibra DGC 5.1 ornewer. If you are running any older version of Collibra DGC, useCollibra DGC Connector 1.3.1.

    About Collibra Connect 2

    Collibra Connect deployment 2

    CHAPTER 1

  • Chapter 1

    About Collibra ConnectCollibra Connect is an integration platform that enables integrations betweenCollibra Data Governance Center and other third-party products, such asInformatica, Salesforce.com and JIRA.

    Collibra Connect includes Anypoint Studio, which is an Eclipse-baseddevelopment environment for developing and enhancing existing integrations.Collibra Connect also includes the Mule ESB Standalone server, which is theserver application where the integration applications are deployed and run afterthey are developed in Anypoint Studio.

    You can use Collibra Connect to:

    l leverage your existing infrastructure by reading metadata from your currentsystems.

    l automate compliance.l provision changes resulting from an issue resolution or data modification.l implement active data governance.

    Collibra Connect deploymentIn the following schema, you see a typical deployment of Collibra Connect,Collibra Data Governance Center and any external system.

    Collibra Connect consists of two major components:

    l Integration flow, often referred to as (integration) templates, defining themapping between Collibra DGC and the external system.

    l Connector which communicates with the Collibra DGC REST API.

    - 2 -

  • Introducing Collibra Connect

    In the schema, you can distinguish the following elements:

    Element Description1 Collibra Data Governance Center2 Collibra Connect. In the production environment, it is recommended to

    deploy the Mule ESB (Collibra Connect) server on a separatemachine (physical or virtual) ensure that Collibra DGC and CollibraConnect do not share memory resources or CPU processing power.Additionally, the port on the Collibra Connect server to which CollibraConnect has to listen, must be open and reachable by the CollibraDGC server, to allow the triggering of integration flows. You candecide and configure which port that is, for example, 443 for standardHTTPS.

    3 Collibra Connect integration templates communicate with CollibraDGC, using the Collibra DGC Connector. The CollibraDGC Connector encapsulates the REST API calls to Collibra DGC toprovide session management support and streamline operations. It isalso possible for Collibra DGC workflows to communicate (startintegration flows or receive BPMN Message Events) with CollibraConnect, see Connecting to the gateway from a workflow.

    4 Integration flows. Each integration flow performs integration logic(migration, broadcast ...) between Collibra DGC and one or moreexternal systems. You can find the documentation and downloadintegration templates on the delivery site.

    5 External systems that are to be integrated with Collibra DGC. Thecommunication with the external system can happen through:

    l A custom connector to the external system; for example IBM BGConnector.

    l A REST API of the external system.l A Relational Database connector to connect to a relation data-base.

    l An other file-based connector, such as FTP, to read a file from aremote location.

    l Any other method that is allowed by the system.

    In general, you can integrate any system with accessible APIs withCollibra DGC through Collibra Connect.

    - 3 -

  • Chapter 1

    Element Description6 Starting the integration.

    There are numerous ways of starting the integration automatically:

    l An integration starts periodically with a configurable frequency.l The integration starts automatically after a change in CollibraDGC or in the external system:

    l A workflow in Collibra DGC is started automatically after aworkflow-starting event, for example when an asset isadded or modified. In turn, the started workflow triggers therequired integration flow.

    l With a built-in mechanism, an integration flow periodicallychecks the external system. If it detects a change, it startsthe actual integration flow.

    l An external system itself triggers the call to start therequired integration flow when necessary. This is only pos-sible if the external system can execute a REST call orregister an event callback.

    l A user starts a workflow in Collibra DGC that in turn calls theintegration flow in Collibra Connect.

    l A workflow in Collibra DGC calls the integration flow in a recur-ring interval.

    l A user triggers the integration flow manually when the most up-to-date view of the data is required.The user can do this by sending the trigger request (REST call)to the Gateway, see About the collibra-domain application. Thisrequest can be made from any computer, using tools like cURLor POSTMAN.

    - 4 -

  • - 5 -

    Installing Collibra ConnectThe installation of Collibra Connect consists of adding the CollibraDGC Connector to Anypoint Studio or Mule ESB Standalone server.

    Note Anypoint Studio is used for development and configurationpurposes, Mule ESB Standalone server for production purposes.

    Integrations with Collibra Connect: from development toproduction 7

    Installing a development environment 8

    System requirements - Anypoint Studio 8

    Install Anypoint Studio 9

    Install the Collibra DGC Connector in Anypoint Studio 9

    Import the Collibra domain in Anypoint Studio 10

    Installing a test and production environment 11

    System requirements - Mule ESB runtime 11

    Install Mule ESB Server runtime 12

    Run Collibra Connect in Mule ESB 12

    Collibra domain deployments in Mule ESB Standalone server 13

    CHAPTER 2

  • Chapter 2

    Install the Mule Management Console 13

    Controlling a Mule ESB Server runtime instance 14

    - 6 -

  • Installing Collibra Connect

    Integrations with Collibra Connect: fromdevelopment to productionSimilar to development processes, it is recommended to create integrations withCollibra Connect in multiple phases.

    The diagram depicts a sample deployment environment for Collibra Connect withthree tiers: a development, a testing and a production environment.

    Each environment has its own purpose:

    Environment PurposeDevelopment The development environment is used to develop the integration

    with Collibra Connect in Anypoint Studio. In this environment,you can use a test environment of Collibra Data GovernanceCenter. A version control system is used to track the developmenthistory and to support collaborative development.

    UAT The UAT environment is used to test the integration. In the testingphase, you no longer run the integration in Anypoint Studio buton a non-production Mule ESB Standalone server.

    Production The production environment is used to run the integration on aproduction Mule ESB Standalone server.

    You can use MMC (Mule Management Console) to monitor Mule ESBStandalone server instances. It provides a centralized, convenient, and intuitiveweb-based interface to monitor, manage, and administer the run-time aspects ofMule ESB.

    - 7 -

    https://docs.mulesoft.com/mule-management-console/v/3.8/deploying-applications

  • Chapter 2

    Installing a development environmentIn a Collibra Connect development environment, you develop integrationsbetween Collibra Data Governance Center and third-party applications. Thedevelopment environment consists of Anypoint Studio and the CollibraDGC Connector.

    System requirements - Anypoint Studio 8

    Install Anypoint Studio 9

    Install the Collibra DGC Connector in Anypoint Studio 9

    Import the Collibra domain in Anypoint Studio 10

    System requirements - Anypoint Studio

    It is highly recommended to use Anypoint Studio 6.x or newer.

    Hardware requirements

    l 4 GB RAMl 2 GHz CPUl 10 GB free hard disk space

    Java Runtime Environment

    One of the following Java runtime environments:

    Warning On OS X or macOS, you first have to install JRE 1.6 beforeinstalling one of the following JRE versions.

    l Oracle Java SE JDK 7 or 8 (recommended). See https://-docs.oracle.com/javase/8/docs/technotes/guides/install/install_over-view.html.

    l IBM JVM 1.7l Open JDK 8

    Operating Systems

    l Windows 7, Windows 8, Windows 10 (32-bit and 64-bit)l OS X / macOS 10.10.0 or newer

    - 8 -

    https://docs.oracle.com/javase/8/docs/technotes/guides/install/install_overview.htmlhttps://docs.oracle.com/javase/8/docs/technotes/guides/install/install_overview.htmlhttps://docs.oracle.com/javase/8/docs/technotes/guides/install/install_overview.html

  • Installing Collibra Connect

    l RHEL 7.0l Ubuntu Server 14.04

    Install Anypoint Studio

    If your computer meets all the system requirements, follow the steps in theMulesoft documentation to download and install Anypoint Studio.

    Install the Collibra DGC Connector in Anypoint Studio

    Prerequisites

    l You have downloaded the latest version of the Collibra DGC Connector(CollibraDGCConnector141UpdateSite.zip in the DGC Connector sec-tion) from the Collibra downloads page.

    l Your Anypoint Studio setup is fully operational. Consult the Mulesoft doc-umentation.

    l You have set up the correct Mule ESB server runtime version in AnypointStudio.We recommend version 3.8.2 or newer.

    Steps

    To install the Collibra DGC Connector in Anypoint Studio, follow these steps:

    1. On the Anypoint Studio menu bar, click Help→ Install New Software.2. In the Install dialog box, click Add.

    3. In the Add Repository dialog box, type a name for your repository, forexample DGC Connector, then click Archive.

    4. Locate the connector archive file (Col-libraDGCConnector141UpdateSite.zip) and click Open.

    5. Click OK.You return to the Install dialog box.

    6. Expand Community and select CollibraDGC Connector (Mule 3.5.0+).7. Click Next twice.8. Accept the terms of the license agreement.

    - 9 -

    https://docs.mulesoft.com/anypoint-studio/v/6/download-and-launch-anypoint-studiohttps://community.collibra.com/downloads/#1472014018471-a10c17c9-0c8bhttps://docs.mulesoft.com/anypoint-studio/v/6/download-and-launch-anypoint-studiohttps://docs.mulesoft.com/anypoint-studio/v/6/download-and-launch-anypoint-studio

  • Chapter 2

    9. Click Finish.If you see a warning about unsigned content, click OK to proceed.

    10. Click Yes to restart Anypoint Studio.

    Import the Collibra domain in Anypoint Studio

    The collibra-domain project enables sharing resources between differentapplications. It also enables you to deploy all integration templates with one clickin Anypoint Studio.

    See Shared Resources to get a better understanding of the domain concept.

    Prerequisites

    l You have downloaded the Collibra domain deployable archive (collibra-domain.zip) from the delivery site.

    l You have correctly installed Anypoint Studio with Collibra DGC Connector.See Install the Collibra DGC Connector in Anypoint Studio.

    Steps

    To import the Collibra domain in Anypoint Studio, follow these steps:

    1. On the menu bar, click File→ Import.2. In the Import dialog box, browse to Anypoint Studio and select Anypoint

    Studio generated Deployable Archive (.zip).3. Click Next.4. Locate the file collibra-domain.zip and click Open.5. Click Finish.6. Select the server runtime that you want to use and click OK. For the correct

    runtime version, consult the compatibility matrix of the archive on the deliv-ery page.

    Result

    In the Package Explorer pane, you see collibra-domain in the list.

    If you imported version 1.0.0, then you see also gateway in the list.

    - 10 -

    https://docs.mulesoft.com/mule-user-guide/v/3.8/shared-resourceshttps://community.collibra.com/downloads/#1472014018471-a10c17c9-0c8bhttps://community.collibra.com/downloads/https://community.collibra.com/downloads/

  • Installing Collibra Connect

    Installing a test and production environmentA test or a production environment runs integration templates in a Mule ESBStandalone server instance instead of Anypoint Studio.

    System requirements - Mule ESB runtime 11

    Install Mule ESB Server runtime 12

    Run Collibra Connect in Mule ESB 12

    Collibra domain deployments in Mule ESB Standalone server 13

    Install the Mule Management Console 13

    Controlling a Mule ESB Server runtime instance 14

    System requirements - Mule ESB runtime

    It is highly recommended to use Mule ESB 3.8.2 or newer.

    Hardware requirements

    l 2 GHz, dual-core CPU, or 2 virtual CPUs in virtualized environments (IntelXeon or equivalent)

    l 2 GB RAMl 4 GB of storage

    Java Runtime Environments

    l Oracle JRE 7 or 8 (recommended)l IBM version 1.7l Open JDK 8

    Operating Systems

    l Windows (32-bit and 64-bit)l 2003l 2008 Serverl 7l 2012 R2l 8.1

    l OS X / macOS 10.10 or newer

    - 11 -

  • Chapter 2

    l RHEL (64-bit) 5.8, 5.11, 6.6, 7l Ubuntu Server 15.04l Oracle Solaris 11l IBM AIX 7.1

    The list of supported operating systems is non-exhaustive. Mule ESB Standaloneserver should also be compatible with any newer operating system than thementioned ones. It should also be supported by any operating system thatsupports the given Java Runtime Environments.

    Install Mule ESB Server runtime

    Prerequisites

    l You have a Mule ESB Server license file. For more information, contactyour Collibra Customer Success agent.

    l You have downloaded the Mule ESB Server archive from the Collibra Com-munity downloads page.Collibra recommends version 3.8.2 or newer.

    Steps

    To install Mule ESB Server runtime on your server, follow these steps:

    1. Extract the Mule ESB archive file to the location of your preference.2. Install the license file, obtained via your Collibra Customer Success agent.

    See the MuleSoft documentation.

    To control the Mule ESB Server runtime instance, use the Mule ManagementConsole (see Install the Mule Management Console) or see Controlling a MuleESB Server runtime instance.

    Run Collibra Connect in Mule ESB

    Running Collibra Connect in Mule ESB is basically deploying the Collibradomain and the Collibra Connect integration template on this instance.

    - 12 -

    https://community.collibra.com/downloads/mule-esb-mmc/https://community.collibra.com/downloads/mule-esb-mmc/https://docs.mulesoft.com/mule-user-guide/v/3.8/installing-an-enterprise-license

  • Installing Collibra Connect

    Prerequisites

    l You have installed a Mule ESB Standalone server instance.l You have downloaded the Collibra domain application (collibra-domain.zip) from the Collibra downloads page.

    l You have downloaded CollibraDGCConnector141UpdateSite.zip fromthe Collibra downloads page.

    Steps

    To run the Collibra Connect in the Mule ESB Standalone server, follow thesesteps:

    1. Save the collibra-domain.zip in the $MULE_HOME/domains folder.2. Save the CollibraDGCConnector141UpdateSite.zip in the $MULE_

    HOME/apps folder.3. Start Collibra Connect. See Run integrations with Collibra Connect in Mule

    ESB Standalone server.

    Tip For more information about running applications in Mule ESBStandalone server, consult the Mule documentation.

    Collibra domain deployments in Mule ESB Standaloneserver

    To deploy the Collibra domain in Mule ESB Standalone server, see Run CollibraConnect in Mule ESB.

    To redeploy the Collibra domain, drop the new archive file in the domains folder.It is not necessary to first remove the old domain, a redeploy automaticallyoverwrites the old domain.

    To remove the Collibra domain, delete the anchor file and directory of theCollibra domain in the domains directory.

    Install the Mule Management Console

    To monitor and manage your Mule ESB Server runtime instances, you can usethe Mule Management Console (MMC).

    - 13 -

    https://community.collibra.com/downloads/#1472014018471-a10c17c9-0c8bhttps://community.collibra.com/downloads/#1472014018471-a10c17c9-0c8bhttps://docs.mulesoft.com/mule-user-guide/v/3.8/application-deployment#deploying-applicationshttps://docs.mulesoft.com/mule-user-guide/v/3.8/application-deployment#deploying-applications

  • Chapter 2

    Prerequisites

    l You have downloaded the MMC WAR file from the Collibra Communitydownloads page.Collibra recommends version 3.8.2 or newer.

    l You have downloaded Apache Tomcat.

    Steps

    To install the Mule Management Console on your system, follow these steps:

    1. Rename the downloaded MMC WAR archive file tommc.war.2. Extract the Tomcat archive file in the location of your preference.3. Copymmc.war to /webapps.4. Start Tomcat.5. Open a browser and go to http://localhost:8080/mmc. It is possible that you

    have to use a different port.6. Sign in with the initial credentials (admin/admin).

    To register a Mule ESB Server runtime instance, consult "Registering the Server"in the MMC documentation.

    Controlling a Mule ESB Server runtime instance

    Linux

    To control the Mule ESB Server runtime instance on Linux, go to the bin directoryof the installation directory.

    Action CommandStart instance mule startStop instance mule stop

    Windows

    To control the Mule ESB Server runtime instance on Windows, go to the bindirectory of the installation directory.

    Then convert the Mule ESB Server to a Windows server: mule install

    Action CommandStart instance mule start

    - 14 -

    https://community.collibra.com/downloads/mule-esb-mmc/https://community.collibra.com/downloads/mule-esb-mmc/https://docs.mulesoft.com/mule-management-console/v/3.8/mmc-walkthrough

  • Installing Collibra Connect

    Action CommandStart instance using .NET utility net start muleStop instance mule stopStop instance using .NET utility net stop mule

    - 15 -

  • Chapter 2

  • - 17 -

    About the Collibra ConnectcomponentsCollibra Connect is a combination of tools and applications that enable youto connect Collibra Data Governance Center with a third-party application.In this section, you get more details about the Collibra Connectcomponents, such as the Collibra domain and the Collibra DGC Connector.

    Collibra domain application 18

    Collibra domain purpose 18

    Collibra domain configuration 19

    About the gateway application 19

    Gateway as single point of entry 19

    Gateway for authentication 20

    Gateway integration with workflow processes 20

    Collibra DGC Connector 21

    About upserting assets 21

    Configuring Collibra Connect to connect to a proxied Collibra DGC 33

    Enable debug logging 34

    Override HTTP PUT and DELETE methods with HTTP POST 35

    CHAPTER 3

  • Chapter 3

    Collibra domain applicationThe collibra-domain application enables sharing resources between differentapplications. It also enables you to deploy all integration templates with one clickin Anypoint Studio, see Run integrations with Collibra Connect in AnypointStudio. See Shared Resources to get a better understanding of the domainsconcept.

    This section describes the purpose of the Collibra domain (collibra-domain) andhow to edit and deploy it on a server.

    Collibra domain purpose 18

    Collibra domain configuration 19

    Collibra domain purpose

    The Mule ESB Standalone server runs on a single Java Virtual Machine (JVM).Its architecture allows for multiple applications to be deployed on a single serverefficiently.

    However, in order to allow for resources, for example TCP ports, to be sharedacross different applications, deployed onto that same server, the configuration ofthe connector for the resources must be shared as well. Therefore, theconfiguration needs to be defined in a “domain”, rather than with each individual“application”, and the applications sharing the same domain can then share thesame resource, for example TCP port 80.

    Basically the Collibra domain is an application which serves as a sharedconfiguration for other applications.

    - 18 -

    https://docs.mulesoft.com/mule-user-guide/v/3.8/shared-resources

  • About the Collibra Connect components

    Collibra domain configuration

    When you deploy the Collibra domain for the first time, there are some defaultsettings which most likely don't apply to your own environment, for example:

    l TCP ports: For example, development and Quality Assurance envir-onments usually run on other ports for HTTP/S while a production envir-onment typically runs on the standard TCP ports 80 and 443.

    l Certificate for HTTPS / SSL.l Connectors for other shared services.

    To edit the domain project, follow these steps:

    1. Import collibra-domain.zip as a mule deployable archive file (zip) intoAnypoint Studio. See Import the Collibra domain in Anypoint Studio.

    2. Change the necessary parameters to reflect your environment.3. Export again as deployable archive file. See the MuleSoft documentation

    for more information.

    About the gateway applicationThe Collibra gateway is an application that has the following features:

    l A single entry point for all integration processes.l An authentication mechanism.l An integration with Collibra DGC workflow processes.

    Gateway as single point of entry 19

    Gateway for authentication 20

    Gateway integration with workflow processes 20

    Gateway as single point of entry

    l The gateway as single point of entry, listens for HTTP POST requestsunder collibra.connect.gateway on port 8081.The URL and port values can be changed by modifying the gateway.xmlfile in the gateway project (under src/main/app).

    l When the gateway receives a message, it starts the integration processdefined by the flowId parameter of the call.

    - 19 -

    https://docs.mulesoft.com/anypoint-studio/v/6/importing-and-exporting-in-studio#exporting-projects-from-studio

  • Chapter 3

    l See Triggering an integration for an example of starting an integration byusing the gateway.

    Gateway for authentication

    l Gateway authenticates and authorizes all incoming requests.l Default credentials are:

    l Username: connect_userl Password: connect_password

    l The credentials can be changed in the gateway.properties under src/-main/app.

    l Example of specifying Basic Authentication credentials in Postman RESTclient:

    .

    Gateway integration with workflow processes

    l Gateway provides an easy way for integration with Collibra DGC work-flows.

    l If the integration process was started from a Collibra DGC workflow and thedgcWorkflowProcessInstanceId and dgcWork-flowMessageEventName parameters were provided in the POST request(x-www-form-urlencoded), then the gateway sends the whole mes-sage payload back to message event identified by the above parameters.

    l See Connecting to the gateway from a workflow for more details.

    - 20 -

  • About the Collibra Connect components

    Collibra DGC ConnectorThe Collibra DGC Connector encapsulates the REST API calls to Collibra DataGovernance Center to provide session management support and streamlineoperations.

    This section describes the features of the Collibra DGC Connector.

    Tip Collibra DGC Connector 1.4.1 supports Collibra DGC 5.1 or newer. Ifyou are running any older version of Collibra DGC, use CollibraDGC Connector 1.3.1.

    About upserting assets 21

    Configuring Collibra Connect to connect to a proxied Collibra DGC 33

    Enable debug logging 34

    Override HTTP PUT and DELETE methods with HTTP POST 35

    About upserting assets

    Upserting assets is the process of synchronizing entities that are retrieved froman external system with assets in Collibra Data Governance Center.

    The integration logic creates new assets in Collibra DGC for entities that aresynchronized for the first time and updates assets in Collibra DGC for entities thathave been synchronized before. The outcome of the operation is the same: thestate of the asset in Collibra DGC is updated to the current state of the externalentity. This method is DataSense documentation enabled, for easier integrationdevelopment.

    You can configure the upsert operations to either create a new domain and/orcommunity for the assets that need synchronizing or to upsert to a defaultdomain.

    About upserting assets by ID

    This operation uses the integration framework to upsert assets to Collibra DGC.Because of that, you can make multiple calls to the REST API of Collibra DGC

    - 21 -

    https://docs.mulesoft.com/anypoint-studio/v/5/datasense

  • Chapter 3

    during the execution of that function.

    Upsert assets by external entity id splits input assets into one collection withassets that have been previously synchronized using upsert and anothercollection with assets that are synchronized for the first time. After the split, twoimports are done to create or update assets accordingly. Because of that, thefunction is not atomic.

    Additional calls can be made to retrieve required information, for example,checking if upserted assets were previously synchronized or finding IDs ofrelated assets. The number of API calls does not depend on the number ofassets that are upserted.

    The operation is idempotent, so performing multiple upserts with the same datawill yield the same result.

    Each upserted asset has to have the same structure (same 'keys' defined)because of the import operation being used. By using DataWeave transform toproduce the data as input for the upsert assets operation, the structure isguaranteed to be the same for all assets.

    Upserting assets by external ID

    Upserting assets by external ID enables you to create or update assets by usingthe entity ID that is used in the external system. The combination external entityID and external system ID serves to identify assets.

    You need the following attributes to enable creating assets in Collibra DGC:

    l The name or ID of the default domain where the asset has to be upserted.l Optionally, the name or ID of the community in which the specified domainis located.

    If there is no asset in Collibra DGC that corresponds with the asset name, thenthe asset is created.

    If there is no domain in Collibra DGC that corresponds with the domain name, thedomain is created, provided that the following attributes are specified:

    l domain type id and/or domain type namel domain namel community id and/or community name

    - 22 -

    https://docs.mulesoft.com/anypoint-studio/v/5/using-dataweave-in-studio

  • About the Collibra Connect components

    If not all of the attributes are specified, the upsert operation fails.

    If Collibra DGC has no community that corresponds with the upserted communityname, the community is created if the community name is provisioned in theupsert operation. If the community name is not provisioned, the upsert operationfails.

    If you upsert assets by external ID, note that you can move assets to any domainin Collibra DGC afterward because of the external ID which ensures that theasset can be found and updated.

    About upserting assets by name

    A few additional calls can be invoked to enrich the given input with all thenecessary data, for instance, for the given domain ID, the method searches forthe domain name that corresponds to this ID.

    Such API calls are invoked once per whole set of assets, therefore they do notdepend on the size of input data.

    Each upserted asset has to have the same structure (same 'keys' defined)because of the import operation being used. By using DataWeave transform toproduce the data as input for the upsert assets operation, the structure isguaranteed to be the same for all assets.

    Upserting assets by name

    Upserting assets by name enables you to either create or update assets by usingthe name of an asset and a domain. You can use this type of upsert if the assetsdo not have an external ID.

    You need the following attributes to enable creating assets in Collibra DataGovernance Center:

    l The name or ID of the domain where the asset has to be upserted.l Optionally, if you only provide the domain name, the name or ID of the com-munity in which the specified domain is located.

    If there is no asset in Collibra DGC that corresponds with the asset name, thenthe asset is created.

    If there is no domain in Collibra DGC that corresponds with the domain name,then the domain is created provided that the following attributes are specified:

    - 23 -

    https://docs.mulesoft.com/anypoint-studio/v/5/using-dataweave-in-studio

  • Chapter 3

    l domain type ID and/or domain type namel domain namel community ID and/or community name

    If Collibra DGC has no community that corresponds with the upserted communityname, the community is created if the community name is provisioned in theupsert operation. If the community name is not provisioned, the upsert operationfails.

    If you upsert assets by name, note that if an asset name has changed, either inCollibra DGC or in the external system, the asset is not going to be updated but anew asset will be created instead.

    Configure the Collibra DGC Connector connection

    To configure a Collibra DGC Connector connection, follow these steps:

    1. In Anypoint Studio and open your project.2. Select the Collibra DGC Connector and click Edit.

    - 24 -

  • About the Collibra Connect components

    3. Fill in the proper values of the following fields:l Usernamel Passwordl Base Application Url: URL of your Collibra DGC instance, whichmust be running.

    4. Select the Enable DataSense option.5. Click Test Connection to test if all parameters are correct.6. In the Test connection dialog box, click OK.7. Click OK to close the Global Element Properties dialog box.

    Configure the parameters for upsert by external ID

    To configure the parameters for upserting assets by external ID, follow thesesteps:

    1. In Anypoint Studio, open your project.2. Select the Collibra DGC Connector.3. On the General tab, go to the Basic Settings section and select Upsert

    assets by external entity ID in the Operation field.4. Fill in the parameters on the same tab in the General section:

    l Default Domain Idl External System Id

    5. Click next to the Asset Type Id field to refresh the list. The refreshretrieves all available asset types of the selected Collibra DGC instance.This action may take a couple of seconds.

    6. From the new Asset Type Id list, select the type of asset that you want toupsert to Collibra DGC.A progress bar appears to show the progress of retrieving all the propertiesthat an asset of the selected type may have.When this process is completed, the DataSense Explorer tab is populatedwith the retrieved attributes and relations. If there are new attributes or rela-tions assigned to the selected asset type, they remain invisible in

    - 25 -

  • Chapter 3

    DataSense Explorer until you refresh the metadata.

    7. Save the changes.

    Configure the parameters for upsert by name

    To configure the parameters for upserting assets by name, follow these steps:

    1. In Anypoint Studio, open your project.2. Select the Collibra DGC Connector.3. On the General tab, go to the Basic Settings section and select Upsert

    assets by name in the Operation field.4. In the General section, click next to the Asset Type Id field to refresh the

    list. The refresh retrieves all available asset types of the selected CollibraDGC instance.This action may take a couple of seconds.Click the refresh button again if new asset types do not appear in the list.

    5. From the new Asset Type Id list, select the type of asset that you want toupsert to Collibra DGCA progress bar appears to show the progress of retrieving all the propertiesthat an asset of the selected type may have.When this process is completed, the DataSense Explorer tab is populatedwith the retrieved attributes and relations. If there are new attributes or rela-tions assigned to the selected asset type, they remain invisible in

    - 26 -

  • About the Collibra Connect components

    DataSense Explorer until you refresh the metadata.

    6. Save the changes.

    About the payload of an upsert operation

    UUIDs

    There are no attribute or relation type names in the textual representation of thetransformation. Names of attribute types are translated to attribute type UUIDswith the DataSense feature. Relation types are translated to relation type UUIDsand the kind of relation, which specifies if the given asset is the head or tail of agiven relation. Using UUIDs of properties instead of names during runtime allowsa Collibra DGC user to freely modify the names of characteristics in CollibraDGC without the need of modifying the integration flow logic.

    If keys of a given attribute or relation type have not been defined in the payload,then any corresponding attributes and relations are not modified by the upsertassets operations. This means that you can manually modify other characteristicsin Collibra DGC without worrying that those changes are removed by theintegration process.

    Related assets

    You can define the relations by using id or externalId of related assets and byspecifying the name of the related asset and its context, namely the domain andcommunity.

    An example of defining relations using names is shown on the following image:

    - 27 -

    https://docs.mulesoft.com/anypoint-studio/v/5/datasense

  • Chapter 3

    About assigning responsibilities

    You can assign responsibilities to users on upserted assets. You can do this byusing user ID, user name, group ID or group name.

    In Anypoint Studio, link the proper fields by dragging and dropping the input tothe output. See Configure the payload of the upsert operation for moreinformation.

    To assign responsibility for:

    l a user by user name, use the property user.userName.l a user by user ID, use the property user.id.l a group by group name, use the property group.groupName.l a user by user name, use the property group.id.

    An example of defining responsibilities is shown in the following image:

    In the given example:

    - 28 -

  • About the Collibra Connect components

    l A role of Administrator (00000000-0000-0000-0000-000000005015) isassigned to:

    l A user named john.smithl A user group with UUID e01987e3-ec95-4ee0-a7c4-53426398d9fa

    l A role Steward is assigned to a user with user name derived form the Ste-wardUserName field in the input payload.

    Responsibilities on each asset defined in the mapping are replaced.

    This means that existing users and groups are overwritten with the users andgroups defined in the mapping script.

    About the upsert parameters

    When you configure the upsert settings, there are some parameters which mayrequire some more details.

    General properties

    Parameter DescriptionDefault Domain Id This is the ID of the domain in Collibra DGC where all

    the new assets have to be created. This parameter isused only when an asset is created, not when it isupdated. This enables you to move the imported assetsto different domains in Collibra DGC and still get themproperly updated by the integration flow, without thembeing moved back to the original domain. This field isnot displayed if you have selected the Upsert by nameoperation.You can define the default domain in the payload for

    - 29 -

  • Chapter 3

    Parameter Descriptioneach asset separately. If it is defined in the payload, theDefault Domain Id field is ignored

    External System Id This is the ID of the external system. It can be any stringthat uniquely identifies any external system that is integ-rated with Collibra DGC. Please check here to get moredetails. This field is not displayed if you have selectedthe Upsert by name operation.

    DataSense Explorer

    In the DataSense Explorer pane, you can see a set of properties that you canassign to the assets that are going to be synchronized.

    The list is composed of two sets:

    l Fixed properties that exist for each asset type (green section in the fol-lowing image).

    l Dynamic properties that depend on the selected asset type (gray section inthe following image).

    - 30 -

  • About the Collibra Connect components

    Fixed properties

    Parameter DescriptiondefaultDomain Functions the same as Default Domain Id but can be

    - 31 -

  • Chapter 3

    Parameter Descriptiondefined per asset. In this way, you can upsert assetsinto multiple domains during one execution of the oper-ation. You can define the default domain by either:

    l specifying the domain's id; orl the domain's name and id; orl the name of the parent community.

    This property is optional. If you do not define it, then theDefault Domain Id is used. If you do define it, it takesprecedence over the Default Domain Id.

    externalId The unique identifier of an entity in the given externalsystem.This property is required for each entity that has to besynchronized.External IDs have to be represented by immutable prop-erties, so good candidates are UUIDs or databaseprimary keys. They are used to find the related asset inCollibra DGC if a given entity has already been syn-chronized before. Please refer to Integration Frameworkfor a detailed explanation.

    name The name of the asset.This property is required for each entity that has to besynchronized.It is often just the name of the entity in the external sys-tem. This name must be unique in the domain in whichthe corresponding asset is going to be created.

    status The status of the asset.This property is optional.

    lastSyncData The date of the last synchronization of the given entity.

    This property is optional.

    It can be used to perform incremental updates. Theseare updates that only affect assets that have beenchanged before.

    - 32 -

  • About the Collibra Connect components

    Dynamic properties

    Parameter Example Descriptionattribute types Description:

    ListThe attribute types are represented by a list ofobjects with only one property value. The val-ues provided in the list are synchronized asvalues of the attribute with the correspondingname in the synchronized asset in CollibraDGC.

    relation types Groups:List

    The relation types are represented by a list ofobjects with two property values, namely idand externalId. You have to make sure that atleast one of those two property values isdefined. The properties are used to identifythe related asset in Collibra DGC.

    l id: represents the Collibra DGC ID ofthe asset. If this is defined, then the rela-tion of type Groups, for example, is cre-ated for the asset with the given ID.

    l externalId: represents the entity in theexternal system. If this is defined, then itcan be used to create relations toassets that have already been syn-chronized with the upsert assets oper-ation or with the integration framework.

    Configuring Collibra Connect to connect to a proxiedCollibra DGC

    If you are using a proxy server to connect to Collibra Data Governance Center,you have to define this in the gateway configuration of the CollibraDGC Connector.

    Proceed as follows:

    1. Open the gateway configuration (Package Explorer pane → gateway→src/main/app→ gateway.xml).

    - 33 -

  • Chapter 3

    2. In the Connections Explorer pane, expand gateway and double-click Col-libraDGC to open the properties.

    3. On the Advanced Settings tab, you can define the proxy settings:

    You have to provide proxy settings for every integration template in theCollibra domain that needs the proxy. This is due to the fact that theconfiguration of the Collibra DGC Connector is not shared across alltemplates in the domain, but set separately for each project.

    TipTo simplify setting and changing the proxy in the future, you can useproperty placeholders in each configuration. You can provide values for theproperty placeholders similarly to how it is done fordgc.config.baseApplicationUrl or dgc.config.user in running integrations(see Running integrations in Collibra Connect), or you can follow the stepsdescribed in Mule's Configuring properties.

    Enable debug logging

    To enable debug logging of an integration template, follow these steps:

    - 34 -

    https://docs.mulesoft.com/mule-user-guide/v/3.8/configuring-properties#setting-environment-variables-in-anypoint-studio

  • About the Collibra Connect components

    1. In Anypoint Studio, click Run→ Run Configurations.2. In the Run Configurations dialog box, click Mule Applications→ col-

    libra-domain.3. Click the Arguments tab.4. Update the VM arguments as follows:

    5. Click Apply to save the changes.6. Click Run to start the domain in debug mode.

    Alternatively, you can follow the instructions for working with system propertieson the Mulesoft website.

    Full information about calls and parameters that are made, along with theresponses sent by the Collibra DGC instance, is available in the console log.

    Override HTTP PUT and DELETE methods with HTTP POST

    You can configure the Collibra DGC Connector to use the HTTP POST methodinstead of the HTTP PUT and HTTP DELETE methods.

    Open the advanced settings of the gateway to enable this feature, seeConfiguring Collibra Connect to connect to a proxied Collibra DGC.

    - 35 -

    https://docs.mulesoft.com/mule-user-guide/v/3.6/configuring-properties#system-properties

  • Chapter 3

    You have to provide proxy settings for every integration template in the Collibradomain that has to replace the PUT and DELETE methods with POST. This isdue to the fact that the configuration of the Collibra DGC Connector is not sharedacross all templates in the domain, but set separately for each project.

    - 36 -

  • - 37 -

    Running integrations inCollibra ConnectIn this section you learn how to run integrations in Collibra Connect usingeither Anypoint Studio or Mule ESB Standalone server.

    Import an integration template 39

    Prerequisites 39

    Steps 39

    Run integrations with Collibra Connect in Anypoint Studio 39

    Run integrations with Collibra Connect in Mule ESB Standaloneserver 41

    Triggering an integration 42

    Manually start an integration using the Collibra domain and gateway42

    Manually start an integration: options 43

    Starting an integration without the Collibra domain and gateway 43

    Setting the runtime environment of integration templates 44

    Check the Mule ESB runtime version 45

    Change the embedded Mule runtime 46

    Install a newMule runtime version 46

    CHAPTER 4

  • Chapter 4

    Steps 46

    - 38 -

  • Running integrations in Collibra Connect

    Import an integration templateTo link Collibra DGC with an external system, you have to import the correctintegration template or develop your own integration template. See Developing acustom integration.

    Prerequisites

    l You have downloaded the integration template of your choice from the Col-libra Marketplace.

    l You have imported the Collibra domain. See Import the Collibra domain inAnypoint Studio

    Steps

    The import of an integration template is identical to the import of the Collibradomain, the only difference is that you have to select the integration templatearchive instead of the collibra-domain.zip archive.

    1. On the menu bar, click File→ Import.2. In the Import dialog box, browse to Anypoint Studio and select Anypoint

    Studio generated Deployable Archive (.zip).3. Click Next.4. Locate your integration template archive and click Open.5. Click Finish.6. Select the server runtime that you want to use and click OK. For the correct

    runtime version, consult the compatibility matrix of the archive on the down-loads page.

    In the Package Explorer pane, you see the imported integration templateappear.

    Run integrations with Collibra Connect in AnypointStudioRunning integrations with Collibra Connect in Anypoint Studio starts the CollibraDGC Connector, deploys the selected integration templates and establishes aconnection to your Collibra DGC environment.

    - 39 -

    https://community.collibra.com/marketplace/https://community.collibra.com/downloads/https://community.collibra.com/downloads/

  • Chapter 4

    TipOnly run Collibra Connect in Anypoint Studio for development purposes.Use Mule ESB Standalone server for production purposes. See Runintegrations with Collibra Connect in Mule ESB Standalone server.

    Steps

    To run integrations with Collibra Connect in Anypoint Studio, follow these steps:

    1. On the menu bar, click Run→ Run Configurations.2. In the Run Configurations dialog box, right-click Mule Applications, then

    click New.3. Define the new configuration:

    l In the Name field, type a name for your configuration.l In the General tab, select the check box in front of collibra-domain.l Optionally select the correct runtime version in the Target ServerRuntime field.

    4. In the Environment tab, configure the connection settings to Collibra DataGovernance CenterTo create a variable, click New, fill in the Name and Value and click OK tosave the variable.Add the proper variables. See Collibra DGC environment variables.

    5. Click Apply.6. Click Run.

    If the integration templates have been successfully deployed, you can see thefollowing in the Console tab:

    - 40 -

  • Running integrations in Collibra Connect

    Run integrations with Collibra Connect in Mule ESBStandalone serverRunning integrations with Collibra Connect consists of deploying the Collibradomain and one or more integration templates in the Mule installation directoriesand defining a connection to a Collibra DGC environment.

    To run an integration with Collibra Connect in Mule ESB Standalone server,follow these steps:

    1. Deploy the required integration templates and collibra-domain on theserver.

    l Save the collibra-domain.zip in the $MULE_HOME/domainsfolder.

    l Save the required integration templates in the $MULE_HOME/appsfolder.

    2. Define the variables of the connection to Collibra DGC as additional JVMparameters during the startup of the server or add to the $MULE_HOME/-conf/wrapper.conf file. See Collibra DGC environment variables.Example of configuration in wrapper.conf:

    - 41 -

  • Chapter 4

    wrapper.java.additional.8=\-Ddgc.config.user=username

    wrapper.java.additional.9=\-Ddgc.config.password=password

    wrapper.java.additional.10=\-Ddgc.-

    con-fig.baseApplicationUrl=http://localhost:8080/com.collibra.dgc.war

    3. Start the Mule ESB runtime server from $MULE_HOME:l MAC or Linux: ./mulel Windows: mule.bat

    Read the Mule deployment section for a description of all deploymentpossibilities.

    You can deploy and manage applications on Mule ESB Standalone server withthe Mule Management Console. See Mule Management Console for the userguide and reference materials.

    Triggering an integrationWith Collibra Connect, you can start integrations automatically, for example byusing Collibra DGC workflows. However, when you use the Collibra Connectgateway, you can also manually start an integration by performing an HTTPPOST call.

    Manually start an integration using the Collibra domain and gateway 42

    Manually start an integration: options 43

    Starting an integration without the Collibra domain and gateway 43

    Manually start an integration using the Collibra domain andgateway

    To manually start an integration, follow these steps:

    1. Open your HTTP client, for example Postman.2. Send a POST request to http://localhost:8081/collibra.connect.gateway with

    the correct options. See Manually start an integration: options.

    - 42 -

    https://docs.mulesoft.com/mule-user-guide/v/3.8/starting-and-stopping-mule-esbhttps://docs.mulesoft.com/mule-user-guide/v/3.8/deployinghttps://docs.mulesoft.com/mule-management-console/v/3.8/

  • Running integrations in Collibra Connect

    As a result the integration with the given flowId starts and the payload parameterspecifies a message that is passed to and interpreted by the integration process.

    The following image displays an example of starting the integration process fromthe Postman REST client.

    Tip Instead of manually starting an integration, you can also schedule thestart of an integration. See Scheduling the trigger of an integration.

    Manually start an integration: options

    Option DescriptionflowId Identifier of the integration.payload The payload that is sent to the process.Username Collibra Connect username.Password Collibra Connect password.

    Starting an integration without the Collibra domain andgateway

    Instead of starting an integration by using the Collibra domain and gateway, youcan also start an integration by using Anypoint endpoints.

    Examples:

    - 43 -

  • Chapter 4

    AnypointStudioendpoint

    Description

    HTTP inboundendpoint

    Starts an integration if it receives an HTTP request (PUT,GET, DELETE, POST).

    File inboundendpoint

    Starts an integration if it receives an incoming file.

    Databaseendpoint

    Starts an integration if it receives a database request, forexample, to poll content from the database.

    JMS endpoint Starts an integration if it receives a new messages in aconfigured queue.

    Tip Instead of manually starting an integration, you can also schedule thestart of an integration. See Scheduling the trigger of an integration.

    Setting the runtime environment of integrationtemplatesA Collibra Connect application package often contains multiple properties files,one per environment. The file name of those properties files contains theenvironment name in the middle, as shown in the following example.

    In this situation, the placeholder configuration contains an environment variableas shown in the following image:

    - 44 -

  • Running integrations in Collibra Connect

    For development environments, you can specify the environment variables in themule-project.xml file:

    For deployment to a server, environment variables can either be appended to theserver's wrapper.conf file (located in the conf directory), or be added to theoperating system. In wrapper.conf, the environment variables are part of theadditional parameters:

    wrapper.java.additional.15=-Dmule.env=dev

    wrapper.java.additional.16=-Dapp.key=testkey

    Note The number must be sequential and can vary between servers,depending on the other parameters that are being used.

    Check the Mule ESB runtime versionBefore you install the integration templates and the Collibra DGC Connector, youhave to check if you have the correct embedded Mule runtime version inAnypoint Studio.

    To check the version of the embedded Mule runtime in Anypoint Studio, followthese steps:

    1. Open Anypoint Studio.2. In the Package Explorer pane, click the arrow next to any project.3. In the files, look for the textMule Server and check its version.

    For example:

    - 45 -

  • Chapter 4

    Change the embedded Mule runtimeIt is possible that you have installed more than one version of the Mule ESBserver runtime.

    To change the runtime of a project, follow these steps:

    1. In the Package Explorer pane, expand the project of your choice andsearch formule-project.xml.

    2. Double-click it to open the package details.3. In the Server Runtime field, select the runtime that you want to use for the

    package.4. Close the details tab.

    Install a new Mule runtime version

    Steps

    To install an extra Mule runtime version, follow these steps:

    1. On the Anypoint Studio menu bar, click Help→ Install New Software.2. In theWork with field, select All Available Sites.3. In the filter field, type Mule ESB Server Runtime to limit the number of res-

    ults.4. Scroll through the results and select the check box(es) of the runtime

    instances that you want to install.5. Click Next twice consecutively.6. Accept the terms of the license agreement and click Finish.

    - 46 -

  • Running integrations in Collibra Connect

    Result

    The new Mule ESB Server runtime versions are installed and can be selectedper integration template.

    - 47 -

  • Chapter 4

  • - 49 -

    Developing a customintegrationIn this section, you learn how to develop custom integrations. It providesmore information on how to integrate your own integration template withCollibra Connect.

    University development course 51

    Designing an integration template 51

    Overview use case 51

    Overview possible scenarios 52

    Configuring integration templates 57

    Learn about the Collibra Connect components 66

    Use gateway with a custom integration template 66

    Prerequisites 66

    Steps 66

    About the gateway entry and end point 69

    Connecting to the gateway from a workflow 70

    Create a connection to the gateway 70

    Connect to the gateway from a workflow 70

    CHAPTER 5

  • Chapter 5

    Sending information to the workflow 71

    Processing a message in a workflow 72

    Scheduling the trigger of an integration 73

    Configuring the trigger interval 73

    Configuring the integration template to start 74

    Configuring the payload 75

    Starting the integration template 75

    Edit Collibra integration templates in Anypoint Studio 75

    - 50 -

  • Developing a custom integration

    University development courseCollibra University has a development course that can help you to create yourown integration templates. This course covers the following topics:

    l Install Anypoint and Create a Basic Flowl Install Mulesoft and MMC on a Serverl Walkthrough of the HANA, Tableau and Hadoop Flowsl Traceability Customization

    You can access this course on the Collibra University pages.

    Designing an integration templateWhen you start creating a new integration template, you first have to decidewhich metadata you are going to import with Collibra Connect and how you aregoing to update it.

    Overview use case 51

    Overview possible scenarios 52

    Configuring integration templates 57

    Overview use case

    The following figure is a high-level overview of the integration scenario forimporting data in Collibra DGC.

    - 51 -

    https://university.collibra.com/courses/integration-with-collibra/

  • Chapter 5

    Step Description0 Starting the integration flow.

    This can be done in three different ways, by:

    l A user who starts a workflow in Collibra DGC thatcalls the integration flow in Collibra Connect.

    l A workflow in Collibra DGC that calls the integrationflow in a recurring interval.

    l Calling the integration flow directly.1 The integration flow has started.2 The integration flow retrieves data from an external system.

    This can be done by using:

    l A custom connector to the external system. Forexample: IBM BG Connector.

    l A Relational Database connector to connect to a rela-tion database.

    l An other file-based connector, such as FTP to read afile from a remote location.

    3 The integration flow transforms the data from the externalsystem to a format that the Collibra DGC Connector canunderstand.

    To do this:

    i. Define the CSV data model of the data to be importedin Collibra DGC.

    ii. Use the Mulesoft Mapper to convert the data from theexternal system into the data model required by Col-libra DGC.

    4 Import the data into Collibra DGC by using the CSV Importfunction from the Collibra DGC Connector.

    Overview possible scenarios

    When you are defining an integration flow to import data from a differentapplication in Collibra DGC, there are a few scenarios that the integration has to

    - 52 -

  • Developing a custom integration

    support. Below is a list of the most common and important scenarios.

    Scenario Description SolutionOnly importthe delta

    You do not typically want to import the entireexternal system every time you run theintegration. You only want to import theassets from the external system that are newor have changed since the last time theintegration was running.

    This is mainly for performance reasons.

    To accomplish this,you have to:

    1. Uniquelyidentify theentity in theexternal sys-tem.

    2. Map theexternal entityto the CollibraDGC asset.

    3. Keep track ofthe last syncdate.

    4. Include thelast sync datein the query toget theexternal sys-tem entities.

    Move assetin CollibraDGC

    When you have imported an external entityinto Collibra DGC with the integration flow,you usually want to move that asset to a dif-ferent community or domain. Other systemstypically don't have the same concept of com-munities or domains, as they are not gov-ernance solutions. When you have moved anasset in Collibra DGC to a different com-munity or domain, you do not want the assetto be created again the next time you run theintegration flow. The integration flow has tobe smart enough to recognize that the assetalready exists in Collibra DGC, but that it's ina different location.

    To accomplish this,you have to:

    1. Uniquelyidentify theentity in theexternal sys-tem.

    2. Map theexternal entityto the CollibraDGC asset.

    3. Update the(moved) asset

    - 53 -

  • Chapter 5

    Scenario Description Solutioninstead of cre-ating a newone whenimporting.

    Updateasset inexternal sys-tem

    When an entity is updated in the external sys-tem, you want the changes to be reflected inthe asset in Collibra DGC. You do not want tocreate a completely new asset in CollibraDGC, because you might have added otherattributes or relations to the asset that you donot want to lose.

    To accomplish this,you have to:

    1. Uniquelyidentify theentity in theexternal sys-tem.

    2. Find the cor-respondingasset in Col-libra DGCthrough themapping.If the cor-respondingasset exists inCollibra DGC,update itinstead of cre-ating a newone.

    Recreateasset in Col-libra DGC

    When an imported asset in Collibra DGC isdeleted by accident, you typically want it tobe re-created the next time the integrationflow is run.

    To accomplish this,you have to:

    1. Uniquelyidentify theentity in theexternal sys-tem.

    2. Find the cor-responding

    - 54 -

  • Developing a custom integration

    Scenario Description Solutionasset in Col-libra DGCthrough themapping.

    3. Re-create thecorrespondingasset if it nolonger existsin CollibraDGC.

    Delete assetin externalsystem

    When entity is deleted in the external systemyou usually also want to delete the cor-responding asset in Collibra DGC. Otherapproaches could be to change the status ofthe asset in Collibra DGC to Deleted orDeprecated for example.

    To accomplish this,you have multipleoptions:

    Sometimes externalsystems provideeasy ways toretrieve entities thatwere removed sincegiven point in time.If this is the caseyou can:

    1. Retrieveunique iden-tifiers of theentities thatwere removedin the externalsystem.

    2. Find cor-respondingassets in Col-libra DGCthrough themapping.

    3. Remove or

    - 55 -

  • Chapter 5

    Scenario Description Solutionchange thestatus of theassets thatwere found.

    If external systemdoesn't provideinformation aboutentities that wereremoved then youcan:

    1. Review all theCollibra DGCassets that aremapped to anentities in anexternal sys-tem during theintegration pro-cess .

    2. Check if theentity stillexists in theexternal sys-tem for eachmapped assetin CollibraDGC (ideallyin batchinstead of oneby one).

    3. Delete theasset in Col-libra DGC orchange itsstatus if the

    - 56 -

  • Developing a custom integration

    Scenario Description Solutionasset does nolonger exist inthe externalsystem.

    When using the upsert assets operation, you do not have to look manually forcorresponding assets in Collibra DGC through mapping because it is doneautomatically. You only have to provide the unique identifier of the entity in theexternal system (here you can find more information about that).

    The operation also takes care of recreating assets that have been removed fromCollibra DGC and of checking if an asset already exists and creating or updatingit accordingly. This way, you are free to move assets to different domains inCollibra DGC, as described in the Move asset in Collibra DGC scenario.

    Configuring integration templates

    Typically, the high-level integration flow logic is composed of several steps butthe configuration that you have to perform is slightly different for each method.

    Importing CSV data

    To configure the import of CSV data, you have to configure the following steps:

    IntegrationStep

    Description

    Get ExternalData

    In this step, the external system is queried to retrieve the data.This can be done through an SQL connector, a customconnector or something similar.

    Sometimes it is useful to create a Java Class (POJO), so thatthe retrieved data is instantiated as Java Objects. That makes it

    - 57 -

  • Chapter 5

    IntegrationStep

    Description

    easier to test and map the data to the Collibra DGC format infuture steps. It also improves the robustness of the solution.

    Enrich withMapping

    This is the most complex step. In this step:

    1. The unique ID of the external asset is identified.2. The mapping between the external asset and the Collibra

    DGC asset is created or updated.3. The asset is created.4. The two Delete cases are handled, either on the Collibra

    DGC side or on the external system side.

    For more information about enriching assets with mapping, seeEnriching assets with mapping

    Convert toCSV

    In this step, the external data is transformed to the format thatCollibra DGC understands. Usually a CSV format is used,because this is the most performing way to update the assets inCollibra DGC. The import only updates the existing assets inCollibra DGC and does not create new assets. This isnecessary to cope with the following scenarios: Move assets inCollibra DGC and Update asset in the external system.

    For more information about converting data to CSV format, seeConverting data to CSV.

    Import in Col-libra DGC

    In this step, the Collibra DGC Connector is used to import theCSV data from the previous step in Collibra DGC.

    For more information about importing CSV data in CollibraDGC, see Importing CSV data in Collibra DGC.

    Upserting assets

    - 58 -

  • Developing a custom integration

    IntegrationStep

    Description

    Get ExternalData

    In this step, the external system is queried to retrieve the data.This can be done through an SQL connector, a customconnector or something similar.

    Sometimes it is useful to create a Java Class (POJO), so thatthe retrieved data is instantiated as Java Objects. That makes iteasier to test and map the data to the Collibra DGC format infuture steps. It also improves the robustness of the solution.

    This step is identical as in the method where import to CSV isused.

    TransformusingDataWeave

    This step describes the core business logic of the integration.Here you define how to transform the entity retrieved from theexternal system to an asset of chosen type in Collibra DGC.

    Upsert to Col-libra DGC

    In this step, given assets are upserted into Collibra DGC. It isequal to the last three steps described in the import CSVsection (Importing CSV data). Here each asset is enriched withmapping to check if it was previously synchronized into CollibraDGC. Conversion into CSV and import are also madeautomatically.

    For more information about upserting assets, see About upserting assets.

    Enriching assets with mapping

    Enriching assets with mapping iterates over every external asset, generates itsunique ID, and enriches the asset with the mapping.

    You can accomplish this with a subflow as shown in the following image:

    - 59 -

  • Chapter 5

    Generate natural key

    How you generate the natural key of the external system asset completelydepends on the external system.

    Ideally, the external system already has a unique ID that can be used to uniquelyidentify the asset in a consistent way. If this is not the case, it is a good practice tocreate a sub-flow that generates the natural key in a consistent way, as shown inthe following image:

    Enrich with mapping

    To enrich each asset with the right mapping information, a subflow is created thattypically looks like the next image:

    It can consist of the following steps:

    Step DescriptionFind the map-ping

    Use the Collibra DGC Connector to find a mapping to a CollibraDGC asset based on the external asset ID (using its unique ID)and the external system ID.

    Example:

    - 60 -

  • Developing a custom integration

    Step Description

    Delete map-ping

    When a mapping is found, but the mapping has no CollibraDGC asset ID, it means that the asset has been deleted inCollibra DGC. You need to clean up the existing mapping bydeleting it.

    Example:

    Check if map-ping exists

    Depending on whether a mapping already exists between theexternal asset and a Collibra DGC asset, you need differentbehaviors. If the mapping already exists, it means that the assetin Collibra DGC also already exists and you only have toupdate its last sync date. If not, you need to create the asset inCollibra DGC, as well as the mapping.

    Example:

    - 61 -

  • Chapter 5

    Step Description

    Create assetand mapping

    If the mapping does not yet exist, you have to create the asset inCollibra DGC, as well as the mapping.

    Example:

    Converting data to CSV

    The first step to converting the external data to the Collibra DGC format, isdeciding what you want to import in Collibra DGC.

    The unique identifier of the asset in Collibra DGC is always used. That isimportant to support the following scenarios:

    l If an asset has been renamed in Collibra DGC, the correct asset is updatedand is also renamed to the external system asset name.

    l If an asset has been moved in Collibra DGC, the correct asset is updatedinstead of a new asset being created.

    You get the asset ID from the mapping information you have gathered in theprevious step.

    - 62 -

  • Developing a custom integration

    The next image displays an example of a mapping from a POJO to a simple CSVstructure:

    Importing CSV data in Collibra DGC

    You can use the Collibra DGC Connector to import the CSV that resulted fromconverting the external data.

    The important part here, is the TableViewConfig that specifies how Collibra DGChas to interpret the CSV data and map it to Collibra DGC concepts.

    The following image is an example import step as well as the TableViewConfig.

    - 63 -

  • Chapter 5

    Table view configuration

    You have to configure the TableViewConfig as follows:

    l Asset (Term) ID should be the unique identifier of the Collibra DGC assets.You can get that ID from the mapping information.

    l The default operation has to be UPDATE, to cope with the scenariosdescribed earlier. You already created the asset, so you do not have toCREATE anything anymore.

    Next you find an example TableViewConfig as used in the previous examples.

    {"TableViewConfig": {

    "Columns": [{

    "Column": {"fieldName": "Id","index": 0

    }},{

    "Column": {"fieldName": "Signifier","index": 1

    }},{

    "Column": {"fieldName": "ParentId","index": 2

    }},{

    "Column": {"fieldName": "Name","index": 3

    }},{

    "Column": {"fieldName": "Description","index": 4

    }},{

    "Column": {

    - 64 -

  • Developing a custom integration

    "fieldName": "DataType","index": 5

    }}

    ],"Default": {

    "Operations": {"Term": [

    "UPDATE"]

    }},"Resources": {

    "Term": {"Id": {

    "name": "Id"},"Signifier": {

    "name": "Signifier"},"StringAttribute": [

    {"LongExpression": {

    "name": "Description"},"labelId": "${dgc.-

    metamodel.attribute.descriptionId}"},{

    "LongExpression": {"name": "Name"

    },"labelId": "${dgc.-

    metamodel.attribute.nameId}"},{

    "LongExpression": {"name": "DataType"

    },"labelId": "${dgc.-

    metamodel.attribute.dataTypeId}"}

    ],"Relation": {

    "typeId": "00000000-0000-0000-0000-000000007042",

    "type": "SOURCE","KeyContent": [

    - 65 -

  • Chapter 5

    "ParentId"],"Target": {

    "Id": {"name": "ParentId"

    }}

    },"name": "Asset"

    }}

    }}

    Learn about the Collibra Connect componentsBefore you can actually start developing an integration template, ensure that youcompletely understand all the Collibra Connect components.

    l Collibra domain applicationl About the gateway applicationl Collibra DGC Connector

    Use gateway with a custom integration templateWhen you develop a custom integration template, you may have to integrate withCollibra DGC workflows or configure the security of those templates.

    The gateway template provides an example and a quick start for adding thosefeatures to the integration templates. The only thing you need to do to reusegateway features, is "registering" the integration template that the gateway has touse.

    Prerequisites

    You have installed collibra-domain. See Import the Collibra domain in AnypointStudio.

    Steps

    To use the gateway with a custom integration template, follow these steps:

    - 66 -

    https://developer.mulesoft.com/docs/display/current/Configuring+the+Spring+Security+Manager

  • Developing a custom integration

    1. In your new integration template, find the VM connector in the palette:

    2. Drag the connector to create the message source of your main integrationtemplate:

    Note The name of the connector (The entry point) is by default VM.

    3. Double-click the connector to change its properties:i. On the General tab, update the Display Name to a meaningful name.ii. From the Connector Configuration list, select shared_vm_con-

    nector .iii. Define the Queue Path. It specifies the name by which the gateway

    recognizes the integration. It corresponds to the flowId parameter thatis used when triggering integrations with the use of the gateway. SeeTriggering an integration.

    - 67 -

  • Chapter 5

    4. Drag and drop a VM connector at the end of your integration flow.

    Note The added VM connector is also added as the last element ofthe exception handling strategy. When an exception occurs during theintegration process, it is also useful to notify the workflow about it andclose the Collibra DGC session.

    5. Double-click the connector to change its properties:

    i. On the General tab, update the Display Name to a meaningful name.ii. Select shared_vm_connector from the Connector Configuration

    list.iii. In the Queue Path field, type gateway_end_process.

    6. Define the payload, which must be sent to the gateway, by adding anexpression in front of the end point.Example of setting a response to "Success" using the Expression com-

    - 68 -

  • Developing a custom integration

    ponent:

    Result

    Your integration script is fully integrated with the Collibra gateway template.

    For more examples using the gateway, see About integration templates.

    About the gateway entry and end point

    Entry point

    Gateway communicates with other applications using the VM Connector. Addingan entry point to the application allows the gateway to start the requiredintegration process.

    End point

    If the integration process is triggered by a Collibra DGC workflow, you may wantto send the result back to the workflow instance or close the Collibra DGCsession. This would prevent the session from being left open after the integrationprocess has ended.

    Each of those steps may be performed manually in any part of the flow, seeSending information to the workflow, but the gateway handles it automatically.

    The gateway sends all text properties that are contained in the "payload" to theCollibra DGC workflow instance and closes the session. Those properties canbe accessed later as variables of a given workflow process instance.

    The gateway sends the response back to the workflow only if thedgcWorkflowProcessInstanceId and dgcWorkflowMessageEventNameproperties were sent during the start of the integration process. See Connectingto the gateway from a workflow for more details.

    - 69 -

  • Chapter 5

    Connecting to the gateway from a workflowIn this section you learn how to integrate the gateway in a Collibra DGCworkflow, trigger the gateway and send messages back to the workflow.

    Create a connection to the gateway 70

    Connect to the gateway from a workflow 70

    Sending information to the workflow 71

    Processing a message in a workflow 72

    Create a connection to the gateway

    You can start Collibra Connect gateway by performing an HTTP POST call to apredefined URL, see About the collibra-domain application. In order to be able tocall the gateway from Collibra DGC, you have to provide the followingconnection properties in the Collibra DGC configuration.xml file:

    Property Valuebase-url URL where the Collibra Connect gateway is waiting for the requests.username Name of the user that is allowed to start the gateway process.password Password of the given user.

    An example with default values:

    Connect to the gateway from a workflow

    To call the gateway from a Collibra DGC workflow, you have to use the followingdelegate:

    com.-col-libra.dgc.core.workflow.activiti.delegate.StartCollibraConnectFlowDelegate

    - 70 -

  • Developing a custom integration

    The integration is started when the workflow reaches the state defined by thedelegate.

    An example of configuring the delegate to start the workflow (flowId parameterdefines which integration has to be started):

    Sending information to the workflow

    To send information from the integration template to the Collibra DGC workflow,you can use a message event mechanism.

    Collibra DGC Connector contains a method for sending message events:

    - 71 -

  • Chapter 5

    By using the displayed configuration, a message event"dgcWorkflowMessageEventName" is sent to the workflow process instance,identified by "dgcWorkflowProcessInstanceId".

    Additionally, all text properties contained in the message payload are accessiblein the workflow as standard variables, for example ${status}.

    Processing a message in a workflow

    To receive the BPMN Message Events in the workflow, you have to use aMessageCatchingEvent.

    You can add the message event to the workflow:

    Additionally, the name of the message event and the process instance ID of theworkflow have to be sent to the gateway using the delegate.

    The "messageEventName1" corresponds to the configuration ofMessageCatchingEvent.

    - 72 -

  • Developing a custom integration

    Scheduling the trigger of an integrationThe following sections provide an example and describe how to trigger anintegration periodically without using Collibra DGC workflows or any otherexternal mechanism.

    To start an integration process periodically from Mule, you can just send a triggerevent (with the required frequency) to an endpoint of the chosen flow.

    You can find the example application on the Collibra delivery site.

    The following screenshot is the visual representation of the flow.

    Configuring the trigger interval 73

    Configuring the integration template to start 74

    Configuring the payload 75

    Starting the integration template 75

    Configuring the trigger interval

    A Poll Scope is used to produce events with the required frequency. To configurethe frequency, go to the properties of the Poll component and fill in the requiredvalue in the Frequency field.

    Example:

    - 73 -

    https://cdn.collibra.com/Community/ProductDownloadFiles/Connect Example Project/example-triggering-integration-periodically-1.1.0.ziphttps://docs.mulesoft.com/mule-user-guide/v/3.8/poll-reference

  • Chapter 5

    The trigger event is produced every 30 seconds if the configuration as shownabove is used.

    Also set the Start delay property to 30 seconds, to allow the required integrationflow to start before the first trigger event is sent.

    Configuring the integration template to start

    The required integration is started by sending an event to a queue of the VMendpoint exposed by the integration.

    The following image shows that an event is sent to the integration, which isstarted by events sent to the dgc2ibmbg queue.

    To find the name to put in the Queue Path field, you can go to the section thatdescribes triggering the integration process in the documentation of the giventemplate and look for the value of the flowId parameter that is used to start theintegration.

    If you have already imported the templates into Anypoint Studio, then you can goto the file called endpoints.xml and check the 'Queue Path' of the required VMconnector as shown in the following image.

    - 74 -

  • Developing a custom integration

    Configuring the payload

    To configure the payload of the event that has to be sent to the requiredintegration, edit the value of the Set Payload component, as shown in thefollowing image:

    Starting the integration template

    This example integration template is attached to the collibra-domain, see Aboutthe collibra-domain application,so it is started along with other applications in thedomain. The trigger events are sent periodically, as configured in the sectionconfiguring the frequency, see Configuring the trigger interval.

    Edit Collibra integration templates in AnypointStudioCollibra Connect integration templates are shipped with generic developmentendpoints, settings and credentials. Therefore, you have to modify them before

    - 75 -

  • Chapter 5

    you can deploy them in your specific environment. You can modify a templatewith Anypoint Studio, which can be download from the Collibra Communitydownloads page. Import the template in Anypoint Studio as a Mule project forediting.

    Tip Templates can be mavenized or not mavenized. A template ismavenized if there is a pom.xml file in the template ZIP file.

    To edit an existing Collibra integration template, follow these steps:

    1. In Anypoint Studio, click File→ Import.2. In the Import dialog box:

    Action DescriptionImportmavenizedtemplateusing Maven:

    If you use Maven as your build automation framework:

    i. Extract the template archive file.ii. Expand Anypoint Studio and selectMaven-based

    Mule Project from pom.xml.Importmavenizedtemplatewithout usingMaven:

    If you do not use Maven:

    i. Extract the template archive file.ii. Expand Anypoint Studio and select Anypoint Stu-

    dio Project from External Location.

    Import tem-plate asdeployablearchive:

    If the template is packaged as deployable archive, thearchive does not contain a folder src/main/apps, expandAnypoint Studio and select Anypoint Studio generatedDeployable Archive (.zip).

    3. Apply the necessary changes to the template.4. Click File→ Export.5. Expand Mule and click Anypoint Studio Project to Mule Deployable

    Archive (includes Studio metadata).

    NoteIf you use Maven to create a package:

    i. Open a terminal session.ii. Go to the directory that contains the template's pom.xml file.iii. Execute the command: mvn clean package.

    - 76 -

    https://community.collibra.com/downloads/#1472014018471-a10c17c9-0c8bhttps://community.collibra.com/downloads/#1472014018471-a10c17c9-0c8b

  • - 77 -

    Collibra MarketplaceThe Collibra Marketplace contains a variety of integration templates. Thetemplates help you to accelerate Collibra implementations and to adopt adata culture by data citizens in your organization.

    You can reach the Collibra Marketplace with marketplace.collibra.com,where you can download the templates and their documentation.

    CHAPTER 6

    https://marketplace.collibra.com/

  • Chapter 7

    Frequently Asked QuestionsDo I need a license for Collibra Connect?

    Yes. In order to use Collibra Connect, your license for the Collibra DGC has tosupport Collibra Connect.

    Please reach out to your Collibra Customer Success representative or contact usat [email protected] to get more information on pricing and features onCollibra Connect license.

    What are the current versions of Collibra Connect?

    The current version is 1.3.x for the Collibra DGC Connector running on AnypointMule ESB 3.7.3 with Java 1.8 ( Java 1.7 is also supported).

    Collibra recommends Mule ESB 3.8.2 or newer.

    Are there trainings provided?

    Please contact your account executive and customer success manager forCollibra Connect specific training courses. For general Anypoint development,free and paid courses can be found here.

    Is Collibra Connect Installed on the Same Server of CollibraDGC?