Listen:
Many times I get questions about a safe and fast way to secure a cluster without big steps like integrate AD structures, simply to prevent unauthorized access. I created this writeup to let you know the steps you need. I used CentOS 6.3 and CDH 4.0.1, but you can use other distributions as well.
Setup KDC on a Linux Box
Install kerberos5 related packages as well as kadmin, too. First thing you have to do is to replace EXAMPLE.COM, which is delivered per default, with your own realm. I used ALO.ALT here.
Example config:
# hadoop1> cat /etc/krb5.conf
[libdefaults]
default_realm = ALO.ALT
dns_lookup_realm = false
dns_lookup_kdc = false
[realms]
ALO.ALT = {
kdc = HADOOP1.ALO.ALT:88
admin_server = HADOOP1.ALO.ALT:749
default_domain = HADOOP1.ALO.ALT
}
[domain_realm]
.alo.alt = ALO.ALT
alo.alt = ALO.ALT
[logging]
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmin.log
default = FILE:/var/log/krb5lib.log
Now tweak your DNS or /etc/hosts to reflect the settings, if you use /etc/hosts be sure you've deployed this both files across your nodes (hosts as well as krb5.conf).
# hadoop1> cat /etc/hosts
192.168.56.101 hadoop1.alo.alt hadoop1
172.22.2.130 hadoop2.alo.alt hadoop2
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
Enable ACL's by writing rules into kdc's config space:
# hadoop1> cat /var/kerberos/krb5kdc/kadm5.acl
*/admin@ALO.ALT *
Now create your kerberos random key with "kdb5_util create -s" and start kerberos as well as kadmin. If you face some errors, revisit your config.
Adding Principals
To use kerberos, you have to add some principals, since we're talking about hadoop I setup all needed princs. We use kadmin.local on the server which hosts the KDC.
addprinc -randkey hdfs/hadoop1.alo.alt@ALO.ALT
addprinc -randkey mapred/hadoop1.alo.alt@ALO.ALT
addprinc -randkey yarn/hadoop1.alo.alt@ALO.ALT
addprinc -randkey hbase/hadoop1.alo.alt@ALO.ALT
addprinc -randkey HTTP/hadoop1.alo.alt@ALO.ALT
addprinc <YOUR_USERNAME>@ALO.ALT
You have to create a password for your <YOUR_USERNAME>, that's all. Now you should be able to sudo into your user account and kinit yourself (su - <YOUR_USERNAME> && kinit).
You need to export the keytabs for all services you want to secure (in fact, all services you want to start in your cluster). We need keytabs for HDFS, MapReduce, YARN and HBase in this case. We start with HDFS (always from kadmin's CLI)
xst -norandkey -k hdfs.keytab hdfs/hadoop1.alo.alt@ALO.ALT HTTP/hadoop1.alo.alt@ALO.ALT
xst -norandkey -k mapred.keytab mapred/hadoop1.alo.alt@ALO.ALT HTTP/hadoop1.alo.alt@ALO.ALT
xst -norandkey -k yarn.keytab yarn/hadoop1.alo.alt@ALO.ALT HTTP/hadoop1.alo.alt@ALO.ALT
Now you have all keytabs exported into the directory where you are:
# hadoop1> ls
hdfs.keytab mapred.keytab yarn.keytab
Chmod the files into the correct owner (chmod 400 hdfs:hadoop hdfs.keytab, chmod 400 mapred:hadoop mapred.keytab and so on and copy them in the right place into $HADOOP_HOME/conf/). From now on, all steps are pretty easy and well documented, so I post only the necessary config changes.
Enable Kerberos in a Cluster
I writeup the changes only, if you haven't them in your *.xml, wrap the XML notation around (<property><name>NAME<value>VALUE</value></name></property>).
hdfs-site.xml:
dfs.block.access.token.enable = true
dfs.namenode.keytab.file = <PATH/TO/hdfs.keytab>
dfs.namenode.kerberos.principal = hdfs/_HOST@ALO.ALT
dfs.namenode.kerberos.internal.spnego.principal = HTTP/_HOST@ALO.ALT
dfs.secondary.namenode.keytab.file = <PATH/TO/hdfs.keytab>
dfs.secondary.namenode.kerberos.principal = hdfs/_HOST@ALO.ALT
dfs.secondary.namenode.kerberos.internal.spnego.principal = HTTP/_HOST@ALO.ALT
dfs.datanode.data.dir.perm = 700
dfs.datanode.address = 0.0.0.0:1004
dfs.datanode.http.address = 0.0.0.0:1006
dfs.datanode.keytab.file = <PATH/TO/hdfs.keytab>
dfs.datanode.kerberos.principal = hdfs/_HOST@ALO.ALT
dfs.webhdfs.enabled = true
dfs.web.authentication.kerberos.principal = HTTP/_HOST@ALO.ALT
dfs.web.authentication.kerberos.keytab = <PATH/TO/hdfs.keytab>
Startup the Namenode, watch the logs for issues and try out if you can connect:
"hadoop dfs -ls /"
It's important that we set a sticky bit to /tmp:
# hadoop1> sudo -u hdfs kinit -k -t hdfs.keytab hdfs/hadoop1.alo.alt@ALO.ALT
# hadoop1> sudo -u hdfs hadoop fs -chmod 1777 /tmp
mapred-site.xml:
mapreduce.jobtracker.kerberos.principal = mapred/_HOST@ALO.ALT
Setup KDC on a Linux Box
Install kerberos5 related packages as well as kadmin, too. First thing you have to do is to replace EXAMPLE.COM, which is delivered per default, with your own realm. I used ALO.ALT here.
Example config:
# hadoop1> cat /etc/krb5.conf
[libdefaults]
default_realm = ALO.ALT
dns_lookup_realm = false
dns_lookup_kdc = false
[realms]
ALO.ALT = {
kdc = HADOOP1.ALO.ALT:88
admin_server = HADOOP1.ALO.ALT:749
default_domain = HADOOP1.ALO.ALT
}
[domain_realm]
.alo.alt = ALO.ALT
alo.alt = ALO.ALT
[logging]
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmin.log
default = FILE:/var/log/krb5lib.log
Now tweak your DNS or /etc/hosts to reflect the settings, if you use /etc/hosts be sure you've deployed this both files across your nodes (hosts as well as krb5.conf).
# hadoop1> cat /etc/hosts
192.168.56.101 hadoop1.alo.alt hadoop1
172.22.2.130 hadoop2.alo.alt hadoop2
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
Enable ACL's by writing rules into kdc's config space:
# hadoop1> cat /var/kerberos/krb5kdc/kadm5.acl
*/admin@ALO.ALT *
Now create your kerberos random key with "kdb5_util create -s" and start kerberos as well as kadmin. If you face some errors, revisit your config.
Adding Principals
To use kerberos, you have to add some principals, since we're talking about hadoop I setup all needed princs. We use kadmin.local on the server which hosts the KDC.
addprinc -randkey hdfs/hadoop1.alo.alt@ALO.ALT
addprinc -randkey mapred/hadoop1.alo.alt@ALO.ALT
addprinc -randkey yarn/hadoop1.alo.alt@ALO.ALT
addprinc -randkey hbase/hadoop1.alo.alt@ALO.ALT
addprinc -randkey HTTP/hadoop1.alo.alt@ALO.ALT
addprinc <YOUR_USERNAME>@ALO.ALT
You have to create a password for your <YOUR_USERNAME>, that's all. Now you should be able to sudo into your user account and kinit yourself (su - <YOUR_USERNAME> && kinit).
You need to export the keytabs for all services you want to secure (in fact, all services you want to start in your cluster). We need keytabs for HDFS, MapReduce, YARN and HBase in this case. We start with HDFS (always from kadmin's CLI)
xst -norandkey -k hdfs.keytab hdfs/hadoop1.alo.alt@ALO.ALT HTTP/hadoop1.alo.alt@ALO.ALT
xst -norandkey -k mapred.keytab mapred/hadoop1.alo.alt@ALO.ALT HTTP/hadoop1.alo.alt@ALO.ALT
xst -norandkey -k yarn.keytab yarn/hadoop1.alo.alt@ALO.ALT HTTP/hadoop1.alo.alt@ALO.ALT
Now you have all keytabs exported into the directory where you are:
# hadoop1> ls
hdfs.keytab mapred.keytab yarn.keytab
Chmod the files into the correct owner (chmod 400 hdfs:hadoop hdfs.keytab, chmod 400 mapred:hadoop mapred.keytab and so on and copy them in the right place into $HADOOP_HOME/conf/). From now on, all steps are pretty easy and well documented, so I post only the necessary config changes.
Enable Kerberos in a Cluster
I writeup the changes only, if you haven't them in your *.xml, wrap the XML notation around (<property><name>NAME<value>VALUE</value></name></property>).
hdfs-site.xml:
dfs.block.access.token.enable = true
dfs.namenode.keytab.file = <PATH/TO/hdfs.keytab>
dfs.namenode.kerberos.principal = hdfs/_HOST@ALO.ALT
dfs.namenode.kerberos.internal.spnego.principal = HTTP/_HOST@ALO.ALT
dfs.secondary.namenode.keytab.file = <PATH/TO/hdfs.keytab>
dfs.secondary.namenode.kerberos.principal = hdfs/_HOST@ALO.ALT
dfs.secondary.namenode.kerberos.internal.spnego.principal = HTTP/_HOST@ALO.ALT
dfs.datanode.data.dir.perm = 700
dfs.datanode.address = 0.0.0.0:1004
dfs.datanode.http.address = 0.0.0.0:1006
dfs.datanode.keytab.file = <PATH/TO/hdfs.keytab>
dfs.datanode.kerberos.principal = hdfs/_HOST@ALO.ALT
dfs.webhdfs.enabled = true
dfs.web.authentication.kerberos.principal = HTTP/_HOST@ALO.ALT
dfs.web.authentication.kerberos.keytab = <PATH/TO/hdfs.keytab>
Startup the Namenode, watch the logs for issues and try out if you can connect:
"hadoop dfs -ls /"
It's important that we set a sticky bit to /tmp:
# hadoop1> sudo -u hdfs kinit -k -t hdfs.keytab hdfs/hadoop1.alo.alt@ALO.ALT
# hadoop1> sudo -u hdfs hadoop fs -chmod 1777 /tmp
mapreduce.jobtracker.keytab.file = mapred/_HOST@ALO.ALT
mapreduce.tasktracker.kerberos.principal = mapred/_HOST@ALO.ALT
mapreduce.tasktracker.keytab.file = mapred/_HOST@ALO.ALT
mapred.task.tracker.task-controller = org.apache.hadoop.mapred.LinuxTaskController
mapreduce.tasktracker.group = mapred
Create a file called taskcontroller.cfg in $HADOOP_HOME/conf/
Now startup the Tasktracker as well as the Jobtracker, too. Watch the logs for issues, and if all is going well up start a sample MR job (like pi or something else) as a "normal" user for testing purposes. Please note, you've to kinit first. To control that you have a gotten a valid ticket use klist.
# hadoop1> cat /etc/hadoop/conf/taskcontroller.cfg
hadoop.log.dir=/var/log/hadoop-0.20-mapreduce/
mapred.local.dir=/opt/hadoop/hdfs/mapred/local
mapreduce.tasktracker.group=mapred
banned.users=mapred,hdfs,bin
min.user.id=500
hello, I had a question:
ReplyDeleteDo you need a separate user for each service? (hdfs, mapred) or can these be started as one user say, hduser. From what I understand it doesn't matter what user you start services as, as long as you kinit properly using keytabs for each service. Am I wrong in this logic? I am getting failed connections from my keytabs when trying to start the namenode and I have tried starting it as hdfs user, but that only leads to permission issues or namenode class not found exaceptions. Any suggestions would be much appreciated.
Thanks you