data utility requirements

20
S.No Fuctionality Description 1 Data auditing 2 3 4 Parsing 5 6 7 data is audited with the use of statistical and database methods to detect anomalies and contradictions: this eventually gives an indication of the characteristics of the anomalies and their Workflow specification The detection and removal of anomalies is performed by a sequence of operations on the data Workflow Execution the workflow is executed after its specification is complete and its correctness is verified. The implementation of the workflow should be efficient, even on large sets of data, which inevitably poses a trade-off because the execution of a data-cleansing operation can be computationally expensive A parser decides whether a string of data is acceptable within the allowed data specification and detects syntax errors data Transformatio n the mapping of the data from its given format into the format expected by the appropriate application. This includes value conversions or translation functions, as well as normalizing numeric values to conform to minimum and maximum values. Data Elimination requires an algorithm for determining whether data contains duplicate representations of the same entity. Usually, data is sorted by a key that would bring duplicate entries closer together for Post - Processing After executing the cleansing workflow, the results are inspected to verify

Upload: william-bernard

Post on 12-Jan-2016

7 views

Category:

Documents


0 download

DESCRIPTION

hyhy

TRANSCRIPT

Page 1: Data Utility Requirements

S.No Fuctionality Description Data Processed1 Data auditing No

2 No

3 No

4 Parsing No

5 No

6 Data Elimination No

7 Post -Processing No

data is audited with the use of statistical and database methods to detect anomalies and contradictions: this eventually gives an indication of the characteristics of the anomalies and their locations

Workflow specification

The detection and removal of anomalies is performed by a sequence of operations on the data

Workflow Execution

the workflow is executed after its specification is complete and its correctness is verified. The implementation of the workflow should be efficient, even on large sets of data, which inevitably poses a trade-off because the execution of a data-cleansing operation can be computationally expensive

A parser decides whether a string of data is acceptable within the allowed data specification and detects syntax errors

data Transformation

the mapping of the data from its given format into the format expected by the appropriate application. This includes value conversions or translation functions, as well as normalizing numeric values to conform to minimum and maximum values.

requires an algorithm for determining whether data contains duplicate representations of the same entity. Usually, data is sorted by a key that would bring duplicate entries closer together for faster identification.

After executing the cleansing workflow, the results are inspected to verify correctness

Page 2: Data Utility Requirements

Remarks

Page 3: Data Utility Requirements

S.No Fuctionality Description Data Processed

1

Generic feeds

2

3

product feeds

Processing of Feeds related to Products4 Bespoke feeds Processing of "Ready to use" feeds5 Subscription feeds Processing of new consumers added

feed will allow individual details from specific partners to be loaded

transactional feed Need to know data requirements for all partners/resources to feed in the transactional data

Page 4: Data Utility Requirements

Remarks

Page 5: Data Utility Requirements

S.No Fuctionality Description Data Processed Remarks1 Cycle initiation Perodic cycles2 Build reference data Referenceing all data used3 Extract Extraction of data from

4 Transform

5 Stage

6 Audit reports7 Publish Publish data to target tables

8 Archive

9 Clean up

clean, apply business rules, check for data integrity, create aggregates or disaggregates

Load data in staging tables (warehousing)

Prepare audit report in compliance to business rules

After publishing, maintaining meta data

Deletion of duplicate/undesireable data

Page 6: Data Utility Requirements

S.No Fuctionality Description Data Processed1

2

3 Change updates

4

5 Data modelling

6 Admin interface

7

8 Admin interface

Data logging mechanism

must contain a comprehensive log of all uploaded CRM data (eg details of all loaded files including load dates, filenames, errors) in dedicated database tables. The log history shall be retained on an indefinite bas

Deceased suppressions

Deceased individuals are identified via an external data cleansing process. This mechanism must allow updates via a user interface, list or API.

Must allow changes in email and postal address, contact numbers, etc

Improving Data Quality

Deceased individuals are identified via an external data cleansing process. This mechanism must allow updates via a user interface, list or API.

a flexible and easily configurable data model for all CRM data stores. Can provide a visualisation of all objects present in the data model and allow changes (eg add table column, add new view) to be applied in real time.

Allow administrators to manage all data warehouse feeds and support standard ETL features ie data transformations, data workflows, data debugging etc

Clickstream data integration

The unstructured data store shall be configured to ingest clickstream data (ie web behavioural data) sourced from web analytics system (e.g. Google Analytics). The ingestion mechanism shall be able to stitch sessions generated by the same user (eg based on Cookie id) together.

Allow administrators to manage all data warehouse feeds and support standard ETL features ie data transformations, data workflows, data debugging etc

Page 7: Data Utility Requirements

Remarks

Page 8: Data Utility Requirements

S.No Fuctionality Description Data Processed Remarks1

2

3

Individual validation/Deletion

Validate/Delete emails, telephone nos (landline and cell), postal addresses for individuals via user interface, input lists or API

Individual multi- channel matching

The match mechanism that ensures individuals with the same or similar criteria are considered for merging as an individual in the single customer view

Marketing consent update

If a record is a 'Yes' to marketing, but we receive his details via a offline Partner source and he has said 'No' to marketing, we need to update the marketing consent field.

Page 9: Data Utility Requirements

S.No Fuctionality Description Data Processed1 Data storage

2 Data Structure

3 Data history

4

5 Audit history

6 Administration

7 Reporting

8 Hypothesis testing support a hypothesis testing feature

must include a bespoke enterprise relational data warehouse for the storage of CRM data

a relational database technology which allows the Structured/unstructured data to be queried and aggregated.

data warehouse must retain all historic supplied data in dedicated staging tables. This requirement is to all retrospective analysis data of supplied if required.

Blending of multiple data sources

The UI will contain data sourced from structured and unstructured data sources. To allow users to run queries across datasets, a mechanism for joining these disparate datasets must be available

keep track of all updates to the system. The audit log shall include modified_date, username, record_ids etc

The aggregated datastore must provide admin features that allows all aspects of an end users account to be configured, including: -Role based client access eg Advanced user, Admin user, Standard user -Permissions -Ability to define user groups with associated standard layout, queries, field configuration etc -Enabling/disabling of account

reports to be created which can be refreshed and updated with new figures when required. These reports may use complex queries and cross-tabulations

Page 10: Data Utility Requirements

Remarks

Page 11: Data Utility Requirements

S.No Fuctionality Description Data Processed Remarks1

2 Cluster analysis

3 Basket analysis

4

5 Sentiment Analysis

6 Text Analysis

7

8 Data plot types

Cross product holding analysis

analyse the most common relationships between more than 2 products

segment individuals into distinct groups, based on defined variables

to identify products most commonly purchased in the same transaction

Linear andNon- linear modelling

both linear and non-linear regression modelling including stepwise model selection. Example of both regression types

analysis, which will help to identify the level of positive/negative sentiment with respect to specific topics

derive high-quality information such as patterns & trends from unstructured data.

User data imports/exports

Ability to extract data quickly, easily and in various formats/layouts. Data may be exported for Direct Mail or telemarketing campaigns, as well as for further analysis / presentation outside the system

graphical plotting of data ie Scatter plot, Line graph, pie chart, bar chart, column chart, bubble chart, boxplot etc

Page 12: Data Utility Requirements

9 Roll up capability

10

11

12 Venn diagrams

allow users to navigate through the CRM dataset from table to table via a "rollup" feature. For example, a user may view details at the individual level (name, age) but then wish to view the sources that the individual has interacted on eg website,store,office

Customer lifetime value

The analytical client must have the functionality to value & forecast customers lifetime value (over a period of time)

Time series analysis and forecasting

The analytical client must support time series analysis and forecasting. For example, we will be able to predict website traffic for the next 3 years on a monthly basis. Time series analysis will allow us to identify the different parts driving this, i.e. trend, seasonal impact, other.

Venn diagrams that allow rapid selections for very complex queries, e.g. Members, in the XYZ country, who are marketable, male and aged 25-35 can be selected in 2 mins or less with the current solution, without prior preparation

Page 13: Data Utility Requirements

NoStructuredUnstructuredBoth

Page 14: Data Utility Requirements

YesNo