deploying blaze advisor rule service in hadoopconversely, standard hadoop best practices should be...

Deploying Blaze Advisor Rule Service in Hadoop

© 2013 Fair Isaac Corporation. All rights reserved. 1 June 24, 2013

Summary

Blaze Advisor is a powerful, high-performance rule engine built upon open standards. It can be deployed

in the Apache™ Hadoop® framework to process Big Data, taking advantage of Hadoop’s support for

parallel and distributed computing. In addition, Blaze Advisor can be configured and deployed easily in

Hadoop so that business analysts can focus on developing decision services with Big Data including

unstructured and semi-unstructured data in a user-friendly authoring environment.

This article describes how integrating Blaze Advisor with Hadoop enables decision management

applications to process Big Data, and it lists several factors to consider when developing an integration

method. It includes two use cases that demonstrate how to integrate a Blaze Advisor decision service

with Hadoop in order to manage decisions with Big Data.

Making Decisions with Big Data

Blaze Advisor is deployed in numerous mission-critical business applications around the world for the

real-time or near real-time processing of business transactions. In this scenario, one or multiple Blaze

Advisor rule services usually run in an application server cluster that scales the rule service application

based on demand. In addition, Blaze Advisor is sometimes used to perform batch processing on large

data sets with predicative analytic models and expert rules. In both the real-time and batch processing

scenarios, Blaze Advisor is normally processing well-defined transactional data that is stored in traditional

databases or data warehouse applications.

The high volume, high velocity, and/or high variety information nature of Big Data, however, presents the

following problems for typical rules engines and decision management applications:

• It is time consuming and sometimes unfeasible to get large volumes of data out of the data

warehouse applications and push it through a decision management application.

• It is difficult to develop traditional data models for decision modeling and processing because of how

fast data is created and modified, and the variety of sources and formats generating it. For example,

decision management systems usually require pre-defined business object models (BOMs)

developed in XML, Java, or other languages to be imported into the authoring environment before

users can create decision logic. This approach is not ideal for handling the velocity and variety of Big

Data. Instead, a flexible data model that can be created and maintained by the data analysts or

decision management application users is needed.

Deploying Blaze Advisor in the Hadoop framework is a solution that enables rules engines and decision

management applications to process Big Data. Hadoop is an open-source technology that enables

parallel work on large clusters and massive amounts of data. Instead of using a single computer, Hadoop

distributes the data analysis among tens, hundreds, and even thousands of computers. Hadoop and its

ecosystem are the leading operating system for distributed processes in the cloud. It provides a

distributed file system for high-throughput data access (HDFS – Hadoop Data File System).

This solution enables Blaze Advisor to process and manage decisions with Big Data, while meeting the

standard objectives for decision management, which are as follows:

• Support businesses to automate business decisions quickly, accurately, and consistently

• Enable organizations to develop analytic models and operationalize the models with full model life

cycle management.

• Provide a tool for business to manage the decision logic with minimum dependency on IT.



Integrating Blaze Advisor Rule Services with Hadoop

Blaze Advisor’s support for Hadoop is tightly coupled with its ability to support writing and executing

decision logic with Big Data. A Blaze Advisor decision service can be integrated with Hadoop through

the Hadoop Streaming API. Hadoop streaming allows the use of arbitrary programs for the map and

reduce phase of a MapReduce task. Input and output data are always represented texturally in streaming.

Before selecting a method for integrating Blaze rule services with Hadoop, consider the following factors:

1. Data Structure. One of the characteristics of Big Data is variety—Big Data may be unstructured,

semi-structured, or structured.

The data that is collected via social media, communication records, browsing habits, and so on is

unstructured or semi-structured text data (data that is not text such as voice can be transformed into

text). Making decisions based on unstructured or semi-structured data requires rapid parsing and

indexing of the tokens in the text data. Blaze Advisor 7.2 includes a new library that makes text

analysis fast and easy (FICO’s Model Builder 7.4 also includes this library). This new library is called

TAE (Text Analytic Engine), and it is built on top of Apache Lucene™. Predicative models can be

created with unstructured data and executed in Blaze Advisor. Additional variables can also be

created and calculated from unstructured text data.

For structured data, Blaze Advisor supports several methods for writing business logic with the data,

including XML BOMs and csv data. An XML schema representing an XML business object model can

be imported into Blaze at authoring time, and XML data can be passed into rule services for

processing at runtime. The output can be XML, or it can be a totally different format. Hadoop has a

StreamXmlRecordReader for reading XML input for the Mapper.

Using csv data as the input, however, is a more flexible format. This methods allows business

analysts to define and modify the mapping of the input fields to the business object model in a simple

and iterative fashion—without a predefined data model. The csv fields can be mapped to the

properties of a Blaze class or to Blaze variables.

2. Mapper Integration. The map procedure in the Hadoop MapReduce framework segments and

distributes the input to the nodes in a Hadoop cluster for parallel processing. A Blaze Advisor rule

service can then run map tasks to process the segmented input data with business rule logic, such as

scoring and determining best action. To run Blaze Advisor in a map-only job, the data records being

processed may not have any dependencies between them.

3. Reducer Integration. Reduce tasks receive output from map tasks that are ordered by key-value

pairs. The reduce tasks are often used to aggregate data with a selected key. A Blaze rule service

can be used in a reduce task to execute business decisions on sorted data. A reduce task can be

implemented in Blaze Advisor in many ways, including the following two:

Writing Structured Rule Language (SRL) code to aggregate the values with business logic. This

method relies on the preservation of global values across invocations, which is supported by

Blaze Advisor.

Invoking a Blaze rule service in a reduce task so that all data records with the same key are

passed into Blaze Advisor at the same time.

4. Deployment Packaging. When running a Blaze rule service as a map job, the Blaze runtime library, rule

service code (adb file), license keys, rule service configuration, and all other application-specific code

can be packaged into a single .zip file. This enables the package to be distributed to all the nodes in

the Hadoop cluster easily. Blaze Advisor does not need to be pre-installed in the Hadoop system.



5. Hot Deployment. Blaze Advisor users typically want rule changes to be deployed automatically or

without significant build work. This can be achieved using the Blaze deployment manager and a

custom script. The deployment manager monitors the timestamp of the generated rule service adb

file. When the timestamp is changed, the deployment manager triggers the auto-build process that

updates the job packages with the new adb file and runs the job. Business analysts can use a Rule

Maintenance Application (RMA), which is a Web-based application for writing rules, to update the

decision logic and deploy the updated code to the Hadoop system.

6. Performance Optimization. To optimize performance in a Hadoop deployment, the best practices

for using Blaze Advisor should be followed. This includes choosing the correct engine mode,

choosing a high-performance XML parser if XML data is used, and enabling multi-threading.

Conversely, standard Hadoop best practices should be applied to Blaze deployments on Hadoop.

This includes tuning job configuration parameters and using combiners when possible.

Use Cases

The following two use cases demonstrate how to integrate a Blaze Advisor decision service with Hadoop

using Hadoop streaming. The first use case is a general use case where csv text data is supported. The

second use case is more restrictive—the data is in xml and standard MapReduce calls are made in Java.

Both examples use traditional financial data (specifically, loan application data); however, their methods

can be applied to other business domains such as marketing, customer management and pricing.

Use Case 1: CSV Data

A customer needs to batch process auto-loan applications via a MapReduce. Each application has an

account ID and other variables such as applicant income and date opened. The account ID may be

associated with multiple applications. An average risk score is calculated for each account ID (via a

decision table). Hadoop automatically splits the account applications file and sends the segmented data

to Blaze deployment bundles on the cluster, which performs both maps and reduces as follows:

• The mapper looks at variables and assigns a risk score according to a decision table lookup.

• The reducer collects all risk scores associated with a given account ID and calculates their average.

This use case includes the following sections summarizing how Blaze Advisor was integrated with

Hadoop to process CSV data:

• Blaze rules project.

• Rules service deployment.

• Hadoop deployment setup.

• Test environment and performance.

• Evaluation.

• Alternative mapping approach for CSV data.

Blaze Rules Project

This section describes the data model and entry points used in the CSV use case.

Data Model

The project uses a simple SRL data model that a business user can create easily. The business user can

then map the fields in a csv line to the SRL data model. An alternate method for mapping csv data in

Blaze Advisor is provided in Alternative Mapping for CSV Data later in this section.



The csv data used in this example is as follows:

20060718,43,2916.00,2,F,H,f,6,0,2,647,-99000784,18,0,43,607,34,82.5,N

20070521,38,4100.00,3,_,H,b,1,0,4,761,18,-99000576,0,100,1508,12,61.30000305,N

Entry Point Functions

The rules project contains a simple decision table with a map entry point function and a reduce entry point

function. The map entry point takes a csv string with the account ID as a key and converts it into an SRL

object. It returns the account ID and its score. The following code snippet demonstrates the map entry

point function:

//Split the csv line by comma, then assign each field to a property of the Loan object

input is a string initially param;

stringSegments is some fixed array of string initially input.split(",");

theLoan is some Loan initially a Loan;

theLoan.dateOpened = stringSegments[0];

theLoan.appAge = stringSegments[1] as an integer;

theLoan.appIncome = stringSegments[2] as a real;

theLoan.appChkSv = stringSegments[3] as a real;

theLoan.appFinanceCo = stringSegments[4] as a string;

theLoan.appResidence = stringSegments[5] as a string;

theLoan.appTimeAddress = stringSegments[7] as a string;

theLoan.cb90Ever = stringSegments[8] as an integer;

theLoan.dealLoanToVal = stringSegments[17] as a real;

apply RiskScoreRules(theLoan).

if theLoan.riskScore = unknown then theLoan.riskScore = 99;

return(input ","theLoan.riskScore);

The reduce entry point function takes the output from the map entry point (a csv string with an account ID

and score), calculates the average score per unique account ID, and returns the account ID and an

average score as a string. The reduce procedure sorts all records with the same key (account ID) from



which the average score can be calculated. This example use simple logic to demonstrate the reduce

procedure in a Blaze rule service. More complex logic may be used to process the sorted and aggregated

data in an actual deployment. The following code snippet demonstrates the reduce entry point function:

//In the reduce procedure the key is followed by the first tab

input is a string initially param;

stringSegments is some fixed array of string initially input.split("\\t");

theLoan is some Loan initially a Loan;

theLoan.accountId = stringSegments[0];

//split the rest of the line and map only the score

stringSegments2 is some fixed array of string initially stringSegments[1].split(",");

theLoan.riskScore = stringSegments2[19] as a real;

avgScore is a real initially 0;

returnValue is a string initially "";

//calculate average score per account ID and return average score once for each unique

//account ID

if (lastAccountId is unknown)

then{

count =1;

totalScore = theLoan.riskScore;

lastAccountId = theLoan.accountId;

returnValue ="";

}else if ( lastAccountId is known and lastAccountId = theLoan.accountId)

then {

count = count + 1;

totalScore = totalScore + theLoan.riskScore;

returnValue=""

}else{

avgScore = totalScore/count;

returnValue = lastAccountId"\t"avgScore;

lastAccountId = theLoan.accountId;

count=1;

totalScore = theLoan.riskScore;

}

return returnValue;

Rule Service Deployment

A Stateless POJO rule server deployment is used. The rule server is configured with a single rule service

with map and reduce entry points. The rule server loads the rules from an adb file. To support calculating average score across multiple records with the same ID, the rule agent recycle policy is set to None. This

ensures that global variables are not automatically reinitialized across rule service sessions. SRL code is

used to reset the values when the account ID has changed.



A Blaze Map runner class and a Blaze Reducer runner class are created for the map and reduce entry

point functions. Each runner class simply reads from System.in and passes the lines to the rule service

one line at a time. The following code snippet from the loanapp.RunnerReducer.java class

demonstrates how to create and configure the rule server:

// Create the server

String serverConfig = (args.length > 0) ? args[0] : _SERVER_CONFIG;

Server server = (Server)Server.createServer(serverConfig);

// Create the client of the server

Client client = new Client(server);

BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

String line = br.readLine();

while(line != null) {

String result = client.reduceEntryPoint(line);

if (!result.equalsIgnoreCase(""))System.out.println(result);

line = br.readLine();

}

// Shut down the Server server instance

server.shutdown();

For more information about generating a Stateless POJO rule server deployment and generating an adb

file in Blaze Advisor, see the product documentation.

Hadoop Deployment Setup

The following steps are used to set up the Hadoop deployment in the CSV use case:

1. Create a single job package (an approximately 30MB .zip file) that includes the following components:

Blaze deployment libraries.

The generated POJO code.

The adb file.

The server configuration file.

Blaze deployment license.

2. Copy the data file and the job .zip file to HDFS

3. Create one script that runs the ReducerRunner class, and another one that runs the MapRunner

class.

4. Run the map and reduce tasks using Hadoop Streaming (for example, using standard I/O). The

output of the map task is sorted and shuffled, then piped to the reduce task. The output of the reduce



task is written to files. The following script shows how runblazemap.sh and runblazereduce.sh

invoke their respective map and reduce entry point functions:

hadoop jar /usr/lib/hadoop-0.20-MapReduce/contrib/streaming/hadoop-streaming-0.20.2-cdh3u4.jar \

-input /user/fred/blaze/autoloan-17M.txt \

-output /user/fred/blaze/autoloans \

-mapper /home/fred/LoanApp/mapper.sh \

-reducer /home/fred/LoanApp/reducer.sh

-file /home/fred/LoanApp/mapper.sh \

-file /home/fred/LoanApp/runblazemap.sh \

-file /home/fred/LoanApp/reducer.sh \

-file /home/fred/LoanApp/runblazereduce.sh \

-cacheArchive /user/fred/blaze/LoanAppJob.zip#LoanAppDeploy \

The mapper.sh and reducer.sh scripts simply pass the contents of standard input to the entry point

functions:

mapper.sh:

/bin/cat | ./runblazemap.sh

reducer.sh:

/bin/cat | ./runblazereduce.sh

Test Environment and performance

The deployment test was run in a Hadoop cluster with four Virtual Machine data nodes (eight cores each).

The results of the deployment test are as follows:

Property Result Notes

Input File Size Approximately 17 million lines (approximately 1.6 GB).

Run Time 15:23 (mm:ss) Used 11 map tasks and 4 reduce tasks.

The average equates to 1.8 MB/s.

Map Tasks

Duration

~04:30 (mm:ss) The throughput of just the map tasks is 6

MB/s.

Note: A stress test was performed using a 250 GB input file in the same test environment, and the results

were consistent with the first test. The throughput of the map tasks was 12 hours, which equates to 5.9

MB/s.

Evaluation

This example demonstrates that running Blaze Advisor Rule Service via Hadoop MapReduce can be

used for the large-scale batch processing of Big Data, and this solution is appropriate for specific

business cases. Blaze Advisor and the Eclipse environment can be integrated with Hadoop without any

modifications.

Alternative Mapping for CSV Data

Instead of using an SRL class in the Blaze Advisor IDE to map the CSV data, users can improve

performance by creating new variables directly in the RMA. With this approach, users can create

variables of different data types and then map them to corresponding csv fields by index. The



user-created variables are based on Java types and therefore are compiled into Java code at the time the

project adb file is generated.

After creating and mapping the variables, the user can then define decision logic in the same RMA. For

example, they can create new rules and decision tables using the variables they defined. The following

graphic demonstrates a decision table in an RMA with user-defined variables.

Use Case 2: XML Data with JAVA MAPREDUCE

Similar to the first use case, a customer needs to batch process auto-loan applications in order to

determine the approved loan amount, and Blaze rules are deployed in Hadoop to perform the batch

processing. The data in this case, however, is XML. In addition, Java Map and Reduce implementations

are used instead of simply running Blaze rule services as jobs in the Hadoop streaming framework.

These implementations provide a tighter integration with the MapReduce framework.

• The map tasks executes a rule flow that invokes various rulesets to assign an approved loan amount

to each application.



• The reduce task collects all records with same user name and calculates the sum of loan amounts.

This use case includes the following sections summarizing how Blaze Advisor was integrated with

Hadoop to process XML data with JAVA MapReduce implementation:

• Blaze rules project.

• Rules service deployment.

• Hadoop deployment setup.

• Evaluation.

Blaze Rules Project

This section describes the data model and entry points used in the XML with JAVA MapReduce use case.

Data Model

A BOM is defined in an xml schema and imported into the Blaze Advisor IDE. In this example, the

Hadoop core classes for I/O are also imported into the IDE so they can be directly used to manipulate the

data.

The data used in this example consists of single XML file that contains 150,000 loan application records.

Each record is in a single line. A key is not used. The following example demonstrates a loan application

record in the XML file.

<?xml version="1.0"?>

<LoanApplication>



<firstName>John</firstName>

<lastName>Applicant</lastName>

<middleName>G</middleName>

<age>22</age>

<married>false</married>

<children>0</children>

<socialSecurityNumber>123-45-6789</socialSecurityNumber>

<streetAddress>123 Fourth St.</streetAddress>

<city>Newark</city>

<state>NJ</state>

<zipCode>07101</zipCode>

<telephoneNumber>201-123-4567</telephoneNumber>

<yearsAtAddress>0</yearsAtAddress>

<own>false</own>

<monthlyHousingPayment>275.0</monthlyHousingPayment>

<monthlyCarPayment>575.0</monthlyCarPayment>

<monthlyCreditCardPayment>775.0</monthlyCreditCardPayment>

<currentEmployer></currentEmployer>

<employerAddress></employerAddress>

<employerCity></employerCity>

<employerState></employerState>

<employerZipCode>0</employerZipCode>

<employerTelephoneNumber></employerTelephoneNumber>

<yearsEmployed>0</yearsEmployed>

<monthlySalary>0.0</monthlySalary>

<previousEmployer></previousEmployer>

<yearsPreviouslyEmployed>0</yearsPreviouslyEmployed>

<amount>1000.0</amount>

<loanTerm>36</loanTerm>

</LoanApplication>

Entry Point Functions

The rules project contains a map entry point function that processes the loan applications and a reduce

entry point function that totals the approved loan amount for each unique applicant.

The map entry point function takes a key-value pair (the key is not defined or used in this use case) and

converts the value, an XML record, to an SRL object. The function then starts a ruleflow that invokes the

various rules. The approved loan amount and the applicant’s name, which functions as a key, are added

to the Hadoop output collector object.



The reduce entry point function takes the key-values pair of the applicant’s name and the list of approved

loan amounts, adds the loan amounts, and then returns the applicant’s name and the total loan amount

via the Hadoop OutputCollector object.

Rule Service Deployment

Similar to the first use case, a stateless POJO rule server is used for deployment, and the rule server is

configured with a single rule service with map and reduce entry point functions. In the entry point

functions, the Hadoop objects are passed in as arguments. The actual result is directly added to the

Hadoop OutputCollector object in Blaze Advisor; therefore, a return value is not expected. The rule

server loads the rules from an adb file. The following code demonstrates the map and reduce entry point

functions that are used to deploy the rules service:

...

public class SrlInvoker extends NdStatelessServer {



final static String SERVER_CONFIG = "./Rule_Service_Definition1.server";

public static SrlInvoker createInstance() {

try {

return (SrlInvoker) SrlInvoker.createServer(_getConfigContents(), null);

} catch (NdLocalServerException e) {

e.printStackTrace();

}

return null;

}

...

public SrlInvoker(NdServerConfig arg0) throws NdLocalServerException {

super(arg0);

}

public void mapEntryPoint(org.apache.hadoop.io.WritableComparable arg0,

org.apache.hadoop.io.WritableComparable arg1,

org.apache.hadoop.mapred.OutputCollector arg2)

throws NdServerException, NdServiceException,

NdServiceSessionException {

Object[] applicationArgs = new Object[3];

applicationArgs[0] = arg0;



invokeService("Rule Service Definition1", "map", null, applicationArgs);

}

public void reduceEntryPoint(org.apache.hadoop.io.WritableComparable arg0,

org.apache.hadoop.io.WritableComparable[] arg1,

org.apache.hadoop.mapred.OutputCollector arg2)

throws NdServerException, NdServiceException,

NdServiceSessionException {

Object[] applicationArgs = new Object[3];




invokeService("Rule Service Definition1", "reduce", null,

applicationArgs);

}

}

Hadoop Deployment

In this use case, custom Hadoop Mapper and Reducer methods are implemented in Java, and the

respective entry points inside the map and reduce methods are invoked. The following code snippets

demonstrate how this is done:

//Mapper implementation example

public class SrlMapper extends MapReduceBase

implements



org.apache.hadoop.mapred.Mapper<WritableComparable<?>, WritableComparable<?>,

WritableComparable<?>, WritableComparable<?>> {

private SrlInvoker srlInvoker = SrlInvoker.createInstance();

@Override

public void map(WritableComparable key, WritableComparable value,

OutputCollector output, Reporter reporter) throws IOException {

try {

srlInvoker.mapEntryPoint(key, value, output);

} catch (Exception e) {

throw new IOException(e);

}

}

}

...

//Reducer implementation example

public class SrlReducer extends MapReduceBase implements

org.apache.hadoop.mapred.Reducer<WritableComparable,WritableComparable,WritableComparable,

WritableComparable> {

private SrlInvoker srlInvoker = SrlInvoker.createInstance();

@Override

public void reduce(WritableComparable key, Iterator<WritableComparable> values,

OutputCollector<WritableComparable, WritableComparable> output, Reporter reporter)

throws IOException {

List<WritableComparable> lValues = new ArrayList<WritableComparable>();

while (values.hasNext()) {

WritableComparable wc = values.next();

lValues.add(wc);

}

WritableComparable[] aValues = new WritableComparable[lValues.size()];

lValues.toArray(aValues);

try {

srlInvoker.reduceEntryPoint(key, aValues, output);

} catch (Exception e) {

throw new IOException(e);

}

}

}

The following steps are used to set up the Hadoop deployment:

1. Verify that the following required files are in the correct locations:

The Blaze deployment libraries.

The generated POJO code.

The Mapper and Reducer implementations.



The Driver class that configures the jobs and runs the jobs: The input format is set to

org.apache.hadoop.streaming.StreamXmlRecordReader. The output of the mappers is

sorted/shuffled and piped to reducers. The output of reducers is written to a file.

The adb file.

The server configuration file.

A Blaze deployment license.

2. Copy the data file to HDFS.

3. Execute the Driver class. The following script shows how the Driver class is executed:

if [[ "upload" -eq "$1" ]]

then

echo "Exporting data files..."

hadoop dfs -rmr /user/guest/work

hadoop dfs -copyFromLocal /home/guest/workspace/RuleInvoker/work /user/guest/

fi

echo "Starting job..."

hadoop jar RuleInvoker.jar com.fico.example.Driver $*

Evaluation

This example demonstrates that Blaze Advisor can process XML records in Hadoop and that Blaze

Advisor can be deeply integrated with Hadoop MapReduce—more so than the first use case primarily

because of the reduce procedure. In this use case, all the records share the same key and they are

passed into Blaze Advisor in the same invocation; therefore, it is easy to write business logic for summary

operations on these records. This use case, however, may not appeal to all users. This is because using

Hadoop data objects directly within Blaze Advisor rules removes the independence of the business

decision engine from the underlying execution platform.

Conclusion

Hadoop provides a powerful scalable batch processing platform for Big Data, and it provides another

deployment option for Blaze Advisor. Blaze Advisor can be integrated into the MapReduce paradigm

easily without any modifications.

When the data input contains no interdependencies, simple map-only jobs can be used for massive

parallel processing with Blaze Advisor providing the decision engine. When data aggregation is required

(for example, calculations based on all transactions for a given account), complete MapReduce functions

can be implemented based on the required processing. This scenario does require the Blaze Advisor

developers to have some knowledge of the MapReduce technique.

deploying blaze advisor rule service in hadoopconversely, standard hadoop best practices should be...

Documents