Listen:
Update 1 (Nov 21, 2011):
- added 3rd interface as host-only-adapter (hadoop1)
- enabled trusted device eth2
About one year ago, I created a small XEN-environment for my engineering pourposes. When I was traveling for hours it was very helpful to track some issues or test new features. The problem was that I had to carry 2 notebooks with me. That was the reason I switched to VirtualBox [1] which runs on OSX, Linux and Windows as well. I could play with my servers and when I did, they configured to death and I reimported them into a clean setup. I think that will also be a good start for new people who have to find into the hadoop ecosystem to see the power without the harm of configuration in a multi-node environment.
The appliance is created with VirtualBox, because it runs on OSX and Windows very easily. The idea behind it is to check new settings in a small environment rather easily; the appliance is designed for research, not for development and really not for production. The appliance has 4 nodes, one master and 3 slaves. The setup is not perfect, but it matched the environment I created it for. We have no seperate secondary namenode, for example. I set up hdfs, hive with mysql-metastore, hBase in distributed mode with zookeeper and stargate.
Before we can play with our own LAB we have to consider that we need some specials before. Please read the site [2] I created for.
[1] https://www.virtualbox.org/wiki/Downloads
[2] http://mapredit.blogspot.com/p/all-in-one-hadoop-multi-node-appliance.html
- added 3rd interface as host-only-adapter (hadoop1)
- enabled trusted device eth2
About one year ago, I created a small XEN-environment for my engineering pourposes. When I was traveling for hours it was very helpful to track some issues or test new features. The problem was that I had to carry 2 notebooks with me. That was the reason I switched to VirtualBox [1] which runs on OSX, Linux and Windows as well. I could play with my servers and when I did, they configured to death and I reimported them into a clean setup. I think that will also be a good start for new people who have to find into the hadoop ecosystem to see the power without the harm of configuration in a multi-node environment.
The appliance is created with VirtualBox, because it runs on OSX and Windows very easily. The idea behind it is to check new settings in a small environment rather easily; the appliance is designed for research, not for development and really not for production. The appliance has 4 nodes, one master and 3 slaves. The setup is not perfect, but it matched the environment I created it for. We have no seperate secondary namenode, for example. I set up hdfs, hive with mysql-metastore, hBase in distributed mode with zookeeper and stargate.
Before we can play with our own LAB we have to consider that we need some specials before. Please read the site [2] I created for.
[1] https://www.virtualbox.org/wiki/Downloads
[2] http://mapredit.blogspot.com/p/all-in-one-hadoop-multi-node-appliance.html
Pretty interesting. This is a good way to create an Hadoop test environment and actually our team is going to use it. I currently use VMWare Player to do something similar on one box, to get a full cluster up for testing purposes. I am the lead developer of oceansync.com, an Hadoop management software tool and so its important to have a test environment that is portable to I can test things quickly.
ReplyDeleteI was testing with vmware-player, but I missed some features VirtualBox provides. The first is the transparency, I can use the app with OSX, Windows (7) and Linux as well.
ReplyDeleteFor consulting is really cool - you can demonstrate some changes in seconds live.
Thanks a lot for this contribution! Based on this I could prepare my test environment in just some minutes. I also tested with vmware-player and finally I switched to VirtualBox too, which runs now on Windows7 and OpenSuse12.1.
ReplyDeleteWhat do you think about a git-hub repository to collect useful admin and/or developer scripts which can than be deployed to a clean preinstalled DEMO- TEST- or TRAINING-cluster which can be based on your work?
@Mirko: Sounds like a good idea, especially for ant builds I think.
ReplyDeleteI created the repository here ...
ReplyDeletehttps://github.com/kamir/hadoop-admin-and-developer-scripts
Hello Sir,
ReplyDeleteI am student & need your help on below error
Hadoop Error while running in multinode cluster
root@ubuntu:/opt/hadoop-1.0.0# bin/hadoop jar hadoop-examples-1.0.0.jar pi 10 1$
Number of Maps = 10
Samples per Map = 10
12/02/03 09:01:47 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
12/02/03 09:01:48 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
12/02/03 09:01:49 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
12/02/03 09:01:50 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
12/02/03 09:01:51 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
12/02/03 09:01:52 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
12/02/03 09:01:53 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
12/02/03 09:01:54 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
12/02/03 09:01:55 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
Please help me if anyone can have solution on this error
Configuration:
hadoop 1.0
Ubuntu 11.10
jdk 1.7