integrate hue with your hadoop cluster - yahoo! hadoop meetup
TRANSCRIPT
WHATIS HUE?
WEB INTERFACE FOR MAKING HADOOP EASIER TO USE Suite of apps for each Hadoop component, like Hive, Pig, Impala, Oozie, Solr, Sqoop2, HBase...
YARN JobTracker Oozie
Pig
HDFS
HiveServer2
Hive Metastore
Cloudera Impala
Solr
HBase
Sqoop2
Zookeeper
LDAP SAML
Hue Plugins
ECOSYSTEMAND APPS
TARGETOF HUE
GETTING STARTED WITH HADOOP BEING PRODUCTIVE EXPLORING DIFFERENT ANGLES OF THE PLATFORM !
LET ANY USER FOCUS ON BIG DATA PROCESSINGBEING COMPATIBLE WITH ANY HADOOP VERSION (0.20/1.2.0/2.3.0)
OPEN SOURCE
~3000 COMMITS 33 CONTRIBUTORS648 STARS212 FORKS !
github.com/cloudera/hue
THE CORETEAM PLAYERS
team.gethue.com
ABRAHAM ELMAHREK
ROMAIN RIGAUX
ENRICO BERTI
CHANG BEER
TALKS
Meetups and events in NYC, Paris, LA, Tokyo, SF, Stockholm, Vienna, San Jose, Singapore…Coming up in London, West coast
AROUNDTHE WORLD
RETREATS
Nov 13 Koh Chang, Thailand May 14 Curaçao, Netherlands Antilles
TREND: GROWTH
gethue.com
HISTORY
HUE 1
Desktop-like in a browser, did its job but pretty slow, memory leaks and not very IE friendly but definitely advanced for its time (2009-2010).
HISTORY
HUE 2.5
New apps, improved the UX adding new nice functionalities like autocomplete and drag & drop.
HISTORY
HUE 3.5+
Where we are now, new UI, several new apps, the most user friendly features to date.
WHICH DISTRIBUTION?
Advanced preview The most stable and cross component checked
Very latest
GITHUB CDH / CM TARBALL
HACKER ADVANCED USER NORMAL USER
WHAT DO YOU NEED?
Python 2.4 2.6 That’s it if using a packaged version. If building from the source, here are the extra packages
SERVER CLIENT
Web Browser IE 9+, FF 10+, Chrome, Safari
HOW DOES THE HUE SERVICE LOOK LIKE?
Process serving pages and also static content
1 SERVER 1 DB
For cookies, saved queries, workflows, …
HOW TO CONFIGURE HUE
HUE.INI
Similar to core-site.xml but with .INI syntax !
Where?
/etc/hue/conf/hue.ini
or
$HUE_HOME/desktop/conf/
pseudo-distributed.ini
[desktop] [[database]] # Database engine is typically one of: # postgresql_psycopg2, mysql, or sqlite3 engine=sqlite3 ## host= ## port= ## user= ## password= name=desktop/desktop.db
AUTHENTICATE / LOGIN
[desktop] [[auth]] # - django.contrib.auth.backends.ModelBackend (entirely Django backend) # - desktop.auth.backend.AllowAllBackend (allows everyone) # - desktop.auth.backend.AllowFirstUserDjangoBackend # - desktop.auth.backend.LdapBackend # - desktop.auth.backend.OAuthBackend # ... ## backend=desktop.auth.backend.AllowFirstUserDjangoBackend
USERS
Can give and revoke permissions to single users or group of users
ADMIN USER
Regular user + permissions
LDAP BACKEND
Integrate your employees: LDAP How to guide
LIST OF GROUPS AND PERMISSIONS
A permission can: - allow access to one app
(e.g. Hive Editor) - modify data from the app
(e.g drop Hive Tables or edit cells in HBase Browser)
CONFIGURE APPSAND PERMISSIONS
A list of permissions
PERMISSIONS IN ACTION
User ‘test’ belonging to the group ‘hiveonly’ that has just the ‘hive’ permissions
CONFIGURE APPSAND PERMISSIONS
HOW HUE INTERACTSWITH HADOOP
YARN
JobTracker
Oozie
Hue Plugins
LDAP SAML
Pig
HDFS HiveServer2
Hive Metastore
Cloudera Impala
Solr
HBase
Sqoop2
Zookeeper
RCP CALLS TO ALLTHE HADOOP COMPONENTS
HDFS EXAMPLE
WebHDFS REST
DN
DN
DN
…
DN
NN
http://localhost:50070/webhdfs/v1/<PATH>?op=LISTSTATUS
HOW
Host/port of all services like Oozie, Yarn, HDFS, HBase… APIs are specified in hue.ini on sections, e.g. [hbase] by major service, Hue core [desktop] or Hue lib [liboozie]
[hbase] # Comma-separated list of HBase Thrift servers for # clusters in the format of '(name|host:port)'. hbase_clusters=(Cluster|localhost:9090) !
[liboozie] # The URL where the Oozie service runs on. # oozie_url=http://hue.ent.cloudera.com:11000/oozie
RCP CALLS TO ALLTHE HADOOP COMPONENTS
Full list
KERBEROS
1 Hue ticket/ principal - no user ticket !
Hue uses its ticket for authenticating to every other service (HDFS, Oozie, …)
read more on the Hue Security Guide
HUE KERBEROS TICKET
kadmin: addprinc -randkey hue/[email protected]
Add Hue user principal to Kerberos
$ kinit -k -t /etc/hue/hue.keytab hue/[email protected]
Test
Ticket should be renewable (krb5.conf and kdc.conf)
[desktop] [[kerberos]] # Path to Hue's Kerberos keytab file hue_keytab=/etc/hue/hue.keytab # Kerberos principal name for Hue hue_principal=hue/FQDN@REALM # add kinit path for non root users kinit_path=/usr/kerberos/bin/kinit
hue.ini
HOW
Hue is a “super proxy” Client could be on a Windows machine, phone… and interact with all the Hadoop services
http://localhost:50070/webhdfs/v1/tmp?op=GETFILESTATUS&user.name=hue&doas=bob
IMPERSONATION
<!-- Hue WebHDFS proxy user setting --><property> <name>hadoop.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.groups</name> <value>*</value> </property>
Call for getting the information about an HDFS file
WebHDFS, add to core-site.xml
HTTPS SSL DB SSL WITH HIVESERVER2
READ MORE … AUDITING
OTHER SECURITYFEATURES
2 Hue instances HA proxy Multi DB Performances: like a website, mostly RPC calls
HIGH AVAILABILITY
HOW
SUM-UP
Enable Hadoop Service APIs for Hue as a proxy user
Configure hue.ini to point to each Service API
Get help on @gethue or hue-user
Install Hue on one machine + Hue Kerberos ticket
Use an LDAP backend
INSTALL CONFIGURE ENABLE
HELP LDAP
CONFIGURATIONS ARE HARD…
…GIVE CLOUDERA MANAGER A TRY!
vimeo.com/91805055
LINKS
@gethue
USER GROUP
hue-user@
WEBSITE
http://gethue.com
LEARN
http://learn.gethue.com
GET HUE
Try in advance the latest and greatest but you’ll have to configure everything on your own.
Get to play with Hue and various Hadoop components in 5 minutes. It’s a self contained CDH environment ready to use.
Newer version than HDP, close to the original 2.5 minus apps like HBase, Impala, Sqoop, Search.
The newest addition, ships Hue 3.0 through the GreenButton products.
Stable and highly tested releases perfectly integrated with the Hadoop ecosystem, automagically configured by Cloudera Manager.
In HDP there’s an old forked version of Hue 2.3.
CLOUDERA’S CDH TARBALL CLOUDERA’S DEMO VM
HORTONWORKS* MAPR* HP CLOUD*
* YOUR MILEAGE MAY VARY.
BIGTOP EMBEDDED/DEMO IN IND. COMPANIES
WHAT ARE YOUR USE CASES?
WHICH COMPONENTS DO YOU USE?
WHAT WOULD YOU LIKE TO SEE IN HUE?
INTERESTED IN CONTRIBUTING? WANNA SAY HELLO? DO YOU WANT A TAILOR
MADE TEAM RETREAT?
QUESTIONS?
TEAM@ GETHUE.COM
HOW
Add Hue as WebHDFS proxy user setting like 3 slides ago Add the property on the right in hdfs-site.xml to enable WebHDFS in the NameNode and DataNodes
<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
HDFS FILE BROWSER
[hadoop] [[hdfs_clusters]] # HA support by using HttpFs ! [[[default]]] # Enter the filesystem uri ##fs_defaultfs=hdfs://localhost:8020 ! # Use WebHdfs/HttpFs as the communication mechanism. ##webhdfs_url=http://localhost:50070/webhdfs/v1
hdfs-site.xml
hue.ini
HOW
Example of config for having Hue interact with Yarn
[hadoop] [[yarn_clusters]] ! [[[default]]] # Enter the host on which you are running the ResourceManager resourcemanager_host=localhost ! # The port where the ResourceManager IPC listens on ## resourcemanager_port=8032 ! # Whether to submit jobs to this cluster submit_to=True ! # Change this if your YARN cluster is Kerberos-secured ## security_enabled=false ! # URL of the ResourceManager API ## resourcemanager_api_url=http://localhost:8088 ! # URL of the ProxyServer API ## proxy_api_url=http://localhost:8088 ! # URL of the HistoryServer API # history_server_api_url=http://localhost:19888 ! [[[ha]]] # Enter the host on which you are running the failover Resource Manager resourcemanager_api_url=http://localhost:8088 ## logical_name= submit_to=True
YARN / MR2
HOW
Based on HiveServer2 interface !
Note for Hive: <property> <name>hive.server2.enable.doAs</name> <value>true</value> </property> !
Video demoSetup tutorial
[beeswax] # Host where Hive server Thrift daemon is running. # If Kerberos security is enabled, use fully-qualified domain name (FQDN). ## hive_server_host=localhost ## hive_server_port=10000 ! # Hive configuration directory, where hive-site.xml is located ## hive_conf_dir=/etc/hive/conf
HIVE (IMPALA / SHARK)
HOW
Make sure share lib is installed !
Alternative Dashboard and Editors
[liboozie] #oozie_url=http://localhost.com:11000/oozie
OOZIE
HOW
Comes with Oozie, no PigServer yet Oozie sharelib Oozie credentials for security
PIG