Download - HDFS connector API reference part 1
Introductions
• Hadoop Distributed File System (HDFS) Connector.
Requirement Required?Requires Mule Enterprise License
Yes
Requires Entitlement No Mule Version 3.6.0 or higher
Kerberos Configuration
• <hdfs:config-with-kerberos>• Connection Management• Kerberos authentication configuration. Here
you can configure properties required by "Kerberos Authentication" in order to establish connection with Hadoop Distributed File System
Kerberos Configuration - AttributesName Java Type Descriptionname String The name of this configuration. With this
name can be later referenced.
nameNodeUri
String The name of the file system to connect to. It is passed to HDFS client as the {FileSystem#FS_DEFAULT_NAME_KEY} configuration entry. It can be overriden by values in configurationResources and configurationEntries.
Kerberos Configuration - AttributeskeytabPath String Path to the keytab file associated
with username. It is used in order to obtain TGT from "Authorization server". If not provided it will look for a TGT associated to username within your local kerberos cache.
username String A simple user identity of a client process. It is passed to HDFS client as the "hadoop.job.ugi" configuration entry. It can be overriden by values in configurationResources and configurationEntries.
Kerberos Configuration - AttributesconfigurationResources
List<String>
A List of configuration resource files to be loaded by the HDFS client. Here you can provide additional configuration files. (e.g core-site.xml)
configurationEntries
Map<String,String>
A Map of configuration entries to be used by the HDFS client. Here you can provide additional configuration entries as key/value pairs.
Simple Configuration
• <hdfs:config>• Connection Management• Simple authentication configuration. Here you
can configure properties required by "Simple Authentication" in order to establish connection with Hadoop Distributed File System
Simple Configuration- Attributesname String The name of this configuration. With
this name can be later referenced.x
nameNodeUri
String The name of the file system to connect to. It is passed to HDFS client as the {FileSystem#FS_DEFAULT_NAME_KEY} configuration entry. It can be overriden by values in configurationResources and configurationEntries.
Simple Configuration- Attributesusername String A simple user identity of a client
process. It is passed to HDFS client as the "hadoop.job.ugi" configuration entry. It can be overriden by values in configurationResources and configurationEntries.
configurationResources
List<String> A List of configuration resource files to be loaded by the HDFS client. Here you can provide additional configuration files. (e.g core-site.xml)
configurationEntries
Map<String,String>
A Map of configuration entries to be used by the HDFS client. Here you can provide additional configuration entries as key/value pairs
Processors
• Read from Path– <hdfs:read-operation>
• XML Sample– <hdfs:read-operation path="/tmp/test.dat"
bufferSize="8192" config-ref="hdfs-conf"/>
Processors
Name Java Type Description
config-ref String Specify which config to use
path String the path of the file to read.
bufferSize int the buffer size to use when reading the file
Returns
Return Java Type Description
InputStream the result from executing the rest of the flow
Get Path Metadata
• <hdfs:get-metadata>• This flow variables are:– hdfs.path.exists - Indicates if the path exists (true or
false)– hdfs.content.summary - A resume of the path info– hdfs.file.checksum - MD5 digest of the file (if it is a file
and exists)– hdfs.file.status - A Hadoop object that contains info
about the status of the file (org.apache.hadoop.fs.FileStatus
Write to Path
• <hdfs:write>• Write the current payload to the designated
path, either creating a new file or appending to an existing one
Write to Path- AttributesName Java Type Description
config-ref String Specify which config to use
path String the path of the file to write to.
permission String the file system permission to use if a new file is created, either in octal or symbolic format (umask).
overwrite boolean if a pre-existing file should be overwritten with the new content.
Write to Path- AttributesbufferSize int the buffer size to use when appending
to the file.
replication int block replication for the file.
blockSize long the buffer size to use when appending to the file.
ownerUserName
String the username owner of the file.
ownerGroupName
String the group owner of the file.
payload InputStream the payload to write to the file.
Append to File
• <hdfs:append>• Append the current payload to a file located at
the designated path.• In order to be able append any data to an
existing file refer to dfs.support.append configuration parameter
Append to File - AttributesName Java Type Description
config-ref String Specify which config to use
path String the path of the file to write to.
bufferSize int the buffer size to use when appending to the file.
payload InputStream
the payload to append to the file.
Delete File - Attributes
Name Java Type Description
config-ref String Specify which config to use
path String the path of the directory to delete.