take the 1-2 day free consulting challenge, see trustworthy discovery of personally identifiable...

3
We enable you to uncover and govern the 2% of data that leads to problems and dissuades using information when it counts Take the 1-2 Day Free Consulting Challenge, see trustworthy discovery of Personally Identifiable Information, Outlier/Risk, Columnar Metadata & more. Also experience an additional 30-day Free trial Discovery of Personally Identifiable Information, Outlier Anomalies/Risk, Dirty Data, Cataloging and Columnar Naming with Metadata, Processing Lineage and more for your Hadoop® Cluster(s) or the BigDataRevealed-VM (Apache™ Hadoop®/BDR complete, configured and ready in minutes). Here is a list of the more important BigDataRevealed Deliverables, built into the Hadoop® eco-system and framework, not an afterthought or add-on, which will cause efficiency issues and more importantly, security issues, risks and exposures. Run as BDR-VM or install in minutes in your live Cluster(s). Companies struggle to be in compliance and locate personally identifiable information (PII) as I will refer to personally identifiable information moving forward: Finding and eradicating risks associated with housing PII data has become front page news and is often extremely costly. Many organizations have identified, discovery and control of PII data as their single most important initiative. PII can be but not limited to: Social Security Number, Credit Cards, email, phone, Maiden Name, Address, Bank Account Numbers, Investment Accounts, Internal Customer Identifiers and more … Unfortunately, it can be anywhere, in misidentified fields, comments, documents or text blocks of any kind. We have read, or listened to stories in the news, where hackers have stolen PII data causing the company embarrassment and loss of customers. The company’s Brand is now negatively viewed by everyone. Depending on regulatory constraints the risks go beyond reputation and loyalty. The breach may spawn litigation costs and associated fines. The PII data problems are not necessarily in systems housed in your organization. It could be information that was enriched through social media and other sources outside of your direct control. Luckily, BigDataRevealed provides efficient means and methods to identify and eradicate these risks. BigDataRevealed offers the means to search and discover PII data of all types and formats. BDR delivers numerous pre-defined pattern searches and offers a simple mechanism for users to define patterns unique to their needs. These user defined discoveries, as well as BDR pre-defined discoveries, are available to all users for collaboration and use.

Upload: steven-meister

Post on 12-Feb-2017

259 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Take the 1-2 Day Free Consulting Challenge, see trustworthy discovery of Personally Identifiable Information, Outlier/Risk, Columnar Metadata & more. Also experience an additional

We enable you to uncover and govern the 2% of data that leads to problems and dissuades using information when it counts

Take the 1-2 Day Free Consulting Challenge, see trustworthy discovery of Personally Identifiable Information, Outlier/Risk, Columnar Metadata & more. Also experience an additional 30-day Free trial

Discovery of Personally Identifiable Information, Outlier Anomalies/Risk, Dirty Data, Cataloging and Columnar Naming with Metadata, Processing Lineage and more for your Hadoop® Cluster(s) or the BigDataRevealed-VM (Apache™ Hadoop®/BDR complete, configured and ready in minutes). Here is a list of the more important BigDataRevealed Deliverables, built into the Hadoop® eco-system and framework, not an afterthought or add-on, which will cause efficiency issues and more importantly, security issues, risks and exposures. Run as BDR-VM or install in minutes in your live Cluster(s).

Companies struggle to be in compliance and locate personally identifiable information (PII) as I will refer to personally identifiable information moving forward:

Finding and eradicating risks associated with housing PII data has become front page news and is often extremely costly. Many organizations have identified, discovery and control of PII data as their single most important initiative.

PII can be but not limited to: Social Security Number, Credit Cards, email, phone, Maiden Name, Address, Bank

Account Numbers, Investment Accounts, Internal Customer Identifiers and more … Unfortunately, it can be anywhere, in misidentified fields, comments, documents or text blocks

of any kind. We have read, or listened to stories in the news, where hackers have stolen PII data causing the

company embarrassment and loss of customers. The company’s Brand is now negatively viewed by everyone. Depending on regulatory

constraints the risks go beyond reputation and loyalty. The breach may spawn litigation costs and associated fines.

The PII data problems are not necessarily in systems housed in your organization. It could be information that was enriched through social media and other sources

outside of your direct control. Luckily, BigDataRevealed provides efficient means and methods to identify and eradicate these risks. BigDataRevealed offers the means to search and discover PII data of all types and formats. BDR delivers numerous pre-defined pattern searches and offers a simple mechanism for users to define patterns unique to their needs. These user defined discoveries, as well as BDR pre-defined discoveries, are available to all users for collaboration and use.

Page 2: Take the 1-2 Day Free Consulting Challenge, see trustworthy discovery of Personally Identifiable Information, Outlier/Risk, Columnar Metadata & more. Also experience an additional

BigDataRevealed’s Outlier / Anomaly discovery, detection and notification/alert

Analyzing outliers is a process that looks for data exceptions beyond what was expected. Exceptions can be the result of fraud, machine failure, shrinkage, web intrusion and many other events.

Outliers result from a statistical process that identifies values that significantly exceed, or fall short of, the average value found in a file or series of similar files. Outliers can be found when analyzing a single data field or by analyzing points on a graph that result from plotting values from two related fields.

Both types of outliers may provide evidence of problems, or perhaps of the existence of a positive business trend you wish to understand and develop.

Examples that can use outlier analysis to determine if unexpected results warrant further analyses.

Fraud detection is constantly improving and maturing, as are the skills of those committing the fraud. The financial community has fraud detection algorithms in place today that resulted from vigorous statistical analysis of known fraudulent practices.

However, the informed criminal is certainly aware of these algorithms and will attempt to cleverly bypass detection with new fraudulent techniques. Outlier discovery may be the first indicator that new techniques are surfacing in your environment.

Transaction volume, transaction time of day, transaction volume plotted against time of day are all data elements that can have outlier’s worthy of investigation.

BigDataRevealed provides you a big data file/columnar naming/catalog, a data scientist’s workbench and vehicle for collecting and storing metadata for the big data environment.

Our rich metadata, can be created in our environment or imported from the environment of your choice. With Big Data Revealed, you don’t have to become a data expert to find your next imperative.

Hadoop and HDFS are meant and made to store data, not column names or headers, nor Metadata. Column names and or headers will just be treated as the rest of the stored data.

So when Data Analysts, Data Scientists, Privacy Officers, Risk Management Officers, ETL and BI developers and others are on the hunt or fishing their data lakes to find specific files with specifically needed information to deliver their reports, BI, Predictive Analytics and Pre Audits and Identification of Personally Identifiable Information or outrageous outlier anomalies, this becomes a long, arduous, inaccurate and near impossible task in Hadoop with today’s Hadoop technologies.

The User can add Column Names and metadata, use the suggested Column Names (if the discovery process found some) or a combination of both or none.

The BigDataRevealed Dictionary. This dictionary is created within the Hadoop Framework and only stored for retrieval by BDR and other third party tools

View rows of data in the file, column and select an existing BigDataRevealed naming dictionary entry, where a user can also create a new entry into the naming dictionary.

After we save and bring up the table, we see the column names, positions and metadata are now part of the Hadoop file, column HDFS framework.

The import process of file, column names with associated metadata to apply to the Hadoop HDFS file. The import process will optionally load the column names and metadata into the BigDataRevealed dictionary as well as any assigned to the column values for the file.

After running files through the BigDataRevealed processes to discover Personally Identifiable Information and column business classifications, these results can be added to the BDR Naming Dictionary as well as made as file Columnar Names.

Page 3: Take the 1-2 Day Free Consulting Challenge, see trustworthy discovery of Personally Identifiable Information, Outlier/Risk, Columnar Metadata & more. Also experience an additional

We enable you to uncover and govern the 2% of data that leads to problems and dissuades using information when it counts

Take the 1-2 Day Free Consulting Challenge, see trustworthy discovery of Personally Identifiable Information, Outlier/Risk, Columnar Metadata & more. Also experience an additional 30-day Free trial

A list of the more important and Hadoop® related needs are (offered as an installable application or a VM with a complete Apache™ Hadoop®) fully configured offering, with delivery day one:

Big Data Revealed Runs On the Hadoop Native Framework Leverages Existing Investment in Technology

Installs, implements and delivering Day One as an Application or Complete VM Virtual Machine OVA downloadable, imports in minutes, Graphical Interface with no need for

knowledge of OS, Hadoop or a super technician – Delivers day one with a complete Hadoop® Data Discovery, Compliance Validation, Anomalies, Outlier Detection/Alerts/Risk, and User

Definable Discovery Consolidate into logically named folders files but subject or business reporting needs A Complete Solution - Repeatable, Collaborative, And Extensible Deep Learning of Emails, Resumes and other formats storing

Descriptive tags, Summaries, Search and opening of Original Document Process Static HDFS, RDBMS Data as Well as Live Streaming Data Feeds with Real-time Discovery Run with BDR GUI or Use Callable Modules Within Current Production Processes and ETL/BI

Processes Read/Source Data Directly: HDFS, Teradata, Oracle, DB2, MySQL, PDF, DOCX, HTML, Excel and

more Logs and Lineage of Users actions and results

If your local to Chicagoland, I will be glad to work at your company and not charge for the technology or time for the opportunity, otherwise I will arrange to deliver the same remotely. I can deliver day one with the technology what will take 10 experts several months and they will get it wrong. I do not mean to come on strong, though I have been in this sector for 33 years now and an expert. When may we discuss this? and it would be great for all to get on a conference call with the Hadoop administrator, data management and C levels responsible for the legal governance and PII exposures and risks.

Please view BigDataRevealed for Hadoop Data Compliance, Outlier Risks, Metadata, Deep Learning of Emails, Resumes, RTF and more. Hadoop’s missing link in also offered as a complete VM with BigDataRevealed, Apache Hadoop fully configured and delivering day One.

www.bigdatarevealed.com click on Video menu for links to all these videos

BigDataRevealed-VM complete with Apache™ Hadoop ® http://bdrvmware.bigdatarevealed.net/bdrvm/BigDataRevealedVirtualMachine-Quickstart-v1.1.ova

ColumnNameingAndMetadataBigDataRevealedinTheHadoopFramework https://youtu.be/dmbE-NhP14k

BigDataRevealedPersonallyIdentifiableInformation https://youtu.be/13feADmBiCQ

BigDataRevealed Deep Learning Emails Resumes More https://youtu.be/k5hn4NtE1B8

BigDataRevealed’s Outlier / Anomaly discovery, detection and notification/alert 06 06 2016 https://youtu.be/-orK8J3U468

BigDataRevealed Consolidator Discovery Pattern Discovery https://youtu.be/xWYNzVCHDS4 Steve Meister 847-791-7838 [email protected]