hadoop data tagging and metadata extension

39
Hadoop Based SQL and Big Data Analytics Solution

Upload: queryio

Post on 14-Dec-2014

1.111 views

Category:

Technology


0 download

DESCRIPTION

QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData. It understands dozens of file formats such as pdf/xls/doc file formats, image files, audio and video files, etc.

TRANSCRIPT

Page 1: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Page 2: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Hadoop Data Tagging and Metadata Extension

Page 3: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

What is MetaData?

Page 4: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

What is MetaData?• Metadata is simply “Data about Data”.

Page 5: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

What is MetaData?• Metadata is simply “Data about Data”.• In terms of file system, the metadata is the information about files like size of file,

time on which the file was created, last modified, type of file, owner of file etc.

Page 6: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

What is MetaData?• Metadata is simply “Data about Data”.• In terms of file system, the metadata is the information about files like size of file,

time on which the file was created, last modified, type of file, owner of file etc.• The file system manages access to both the content of files and the metadata

about those files.

Page 7: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

What is MetaData?• Metadata is simply “Data about Data”.• In terms of file system, the metadata is the information about files like size of file,

time on which the file was created, last modified, type of file, owner of file etc.• The file system manages access to both the content of files and the metadata

about those files. • Metadata characterizes data. It is used to provide documentation such that data

can be understood and more readily consumed by your organization. Metadata answers the who, what, when, where, why, and how questions for users of the data.

Page 8: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

MetaData Extension with QueryIO

Page 9: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

MetaData Extension with QueryIO• QueryIO provides On-Ingest metadata extraction service where by extended

metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested.

Page 10: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

MetaData Extension with QueryIO• QueryIO provides On-Ingest metadata extraction service where by extended

metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested.

• To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the metadata for each file stored on Hadoop in a relational database.

Page 11: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

MetaData Extension with QueryIO• QueryIO provides On-Ingest metadata extraction service where by extended

metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested.

• To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the metadata for each file stored on Hadoop in a relational database.

• Since all the metadata and tags associated with a file are kept in a relational database, you can leverage the existing infrastructure built around SQL to search the data on the Hadoop cluster.

Page 12: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

MetaData Extension with QueryIO• QueryIO provides On-Ingest metadata extraction service where by extended

metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested.

• To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the metadata for each file stored on Hadoop in a relational database.

• Since all the metadata and tags associated with a file are kept in a relational database, you can leverage the existing infrastructure built around SQL to search the data on the Hadoop cluster.

• It understands dozens of file formats such as pdf/xls/doc file formats, image files, audio and video files, etc.

Page 13: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

What are Data Tags?

Page 14: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

What are Data Tags?• Tag is a label attached to someone or something for the purpose of identification

or to give other information.

Page 15: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

What are Data Tags?• Tag is a label attached to someone or something for the purpose of identification

or to give other information.• A Data Tag is a tag attached to the data or file to provide extra information about

the data or file.

Page 16: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

What are Data Tags?• Tag is a label attached to someone or something for the purpose of identification

or to give other information.• A Data Tag is a tag attached to the data or file to provide extra information about

the data or file.• Data tags can be used to categorize the data based on various criteria to manage

vast amount of data. Finally the data can be extracted, sorted and processed based on these categories.

Page 17: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

What are Data Tags?• Tag is a label attached to someone or something for the purpose of identification

or to give other information.• A Data Tag is a tag attached to the data or file to provide extra information about

the data or file.• Data tags can be used to categorize the data based on various criteria to manage

vast amount of data. Finally the data can be extracted, sorted and processed based on these categories.

• Adding data tags to the data based on some condition or unconditionally is called Data Tagging.

Page 18: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

What are Data Tags?• Tag is a label attached to someone or something for the purpose of identification

or to give other information.• A Data Tag is a tag attached to the data or file to provide extra information about

the data or file.• Data tags can be used to categorize the data based on various criteria to manage

vast amount of data. Finally the data can be extracted, sorted and processed based on these categories.

• Adding data tags to the data based on some condition or unconditionally is called Data Tagging.

Page 19: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Data Tagging with QueryIO

Page 20: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Data Tagging with QueryIO• QueryIO provides advanced manual and automated data tagging feature which

allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags).

Page 21: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Data Tagging with QueryIO• QueryIO provides advanced manual and automated data tagging feature which

allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags).

• Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster.

Page 22: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Data Tagging with QueryIO• QueryIO provides advanced manual and automated data tagging feature which

allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags).

• Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster.

• Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time.

Page 23: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Data Tagging with QueryIO• QueryIO provides advanced manual and automated data tagging feature which

allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags).

• Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster.

• Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time.

Page 24: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Data Tagging with QueryIO• QueryIO provides advanced manual and automated data tagging feature which

allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags).

• Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster.

• Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time.

Data Tagging

Page 25: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Data Tagging with QueryIO• QueryIO provides advanced manual and automated data tagging feature which

allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags).

• Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster.

• Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time.

TagTag

TagData Tagging

Page 26: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Unconditional Data Tagging

Page 27: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Unconditional Data Tagging• QueryIO provides both conditional and unconditional Tagging. User can choose to

tag hand picked files or files in a particular folder.

Page 28: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Unconditional Data Tagging• QueryIO provides both conditional and unconditional Tagging. User can choose to

tag hand picked files or files in a particular folder.• For that all the used need to do is open the HDFS data browser, choose the files

you want to tag, and click on “add tag” button.

Page 29: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Unconditional Data Tagging• QueryIO provides both conditional and unconditional Tagging. User can choose to

tag hand picked files or files in a particular folder.• For that all the used need to do is open the HDFS data browser, choose the files

you want to tag, and click on “add tag” button.• Unconditional tagging is useful when you want to tag the files whose HDFS

location is already known to you and the tagging is not dependent on other attributes of file like file type, file length etc.

Page 30: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Unconditional Data Tagging• QueryIO provides both conditional and unconditional Tagging. User can choose to

tag hand picked files or files in a particular folder.• For that all the used need to do is open the HDFS data browser, choose the files

you want to tag, and click on “add tag” button.• Unconditional tagging is useful when you want to tag the files whose HDFS

location is already known to you and the tagging is not dependent on other attributes of file like file type, file length etc.

Page 31: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Unconditional Data Tagging• QueryIO provides both conditional and unconditional Tagging. User can choose to

tag hand picked files or files in a particular folder.• For that all the used need to do is open the HDFS data browser, choose the files

you want to tag, and click on “add tag” button.• Unconditional tagging is useful when you want to tag the files whose HDFS

location is already known to you and the tagging is not dependent on other attributes of file like file type, file length etc.

Page 32: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Conditional Data Tagging

Page 33: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Conditional Data Tagging• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)

OR by parsing the content of the file (Ex: if NumberOfLines > 100).

Page 34: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Conditional Data Tagging• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)

OR by parsing the content of the file (Ex: if NumberOfLines > 100).• Also the tag value can be obtained by parsing the content of file (Ex:

NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)

Page 35: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Conditional Data Tagging• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)

OR by parsing the content of the file (Ex: if NumberOfLines > 100).• Also the tag value can be obtained by parsing the content of file (Ex:

NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)• Conditional data tags can be added on chosen file types or on all files present on

the HDFS cluster.

Page 36: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Conditional Data Tagging• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)

OR by parsing the content of the file (Ex: if NumberOfLines > 100).• Also the tag value can be obtained by parsing the content of file (Ex:

NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)• Conditional data tags can be added on chosen file types or on all files present on

the HDFS cluster.

Page 37: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Conditional Data Tagging• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)

OR by parsing the content of the file (Ex: if NumberOfLines > 100).• Also the tag value can be obtained by parsing the content of file (Ex:

NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)• Conditional data tags can be added on chosen file types or on all files present on

the HDFS cluster.

Page 38: Hadoop Data Tagging and Metadata Extension

Hadoop Based SQL and Big Data Analytics Solution

Conditional Data Tagging• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)

OR by parsing the content of the file (Ex: if NumberOfLines > 100).• Also the tag value can be obtained by parsing the content of file (Ex:

NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)• Conditional data tags can be added on chosen file types or on all files present on

the HDFS cluster.