This is how I installed Hadoop-2.5.1 on my Virtual Box VM with Ubuntu-14.01 installed on it.

Download the latest hadoop version from:
. From here download hadoop-2.5.1.tar.gz from the ‘/stable’ folder.

Now that the required version of Hadoop is downloaded, let’s install the required software.

Install Java:
Hadoop framework is mostly written in Java => Java is necessary. Run the following commands to install Java if not already installed
$ sudo apt-get update
$ sudo apt-get install default-jdk
# check if java is properly installed using the following command
$ java -version
# Above line should return the version of java that has been installed

Add a dedicated hadoop user:
$ sudo addgroup hadoop # creates a group called ‘hadoop’
$ sudo adduser –ingroup hadoop hduser # creates a user named ‘hduser’ in the ‘hadoop’ user group

Install SSH:
$ sudo apt-get install ssh
#Check if ssh is properly installed using the following commands which return something like /usr/bin/ssh and /usr/bin/sshd when executed
$ which ssh
$ which sshd

Create and setup SSH certificates:
#Assign sudo permissions to the user ‘hduser’ using the following command
$ sudo adduser hduser sudo
$ su hduser
# Generate an SSH key
$ ssh-keygen -t rsa -P “”
# Add the newly created key to the list of authorized keys so that Hadoop can use ssh without prompting for a password, using the following command
$ cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys
# Check if SSH works, as follows:
$ ssh localhost
This should return something similar to:
The authenticity of host ‘localhost (’ can’t be established.
ECDSA key fingerprint is e1:8b:a0:a5:75:ef:f4:b4:5e:a9:ed:be:64:be:5c:2f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘localhost’ (ECDSA) to the list of known hosts.
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-27-generic x86_64)
* Documentation:
The programs included with the Ubuntu system are free software;

the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

Disabling IPV6:
We need to disable IPv6 because Ubuntu is using IP for different Hadoop configurations. You will need to run the following commands using a root account:
$ sudo nano /etc/sysctl.conf
Add the following lines to the end of the file and reboot the machine, to update the configurations correctly:
# disable ipv6

net.ipv6.conf.all.disable_ipv6 = 1

net.ipv6.conf.default.disable_ipv6 = 1

net.ipv6.conf.lo.disable_ipv6 = 1

$ sudo reboot

Install Hadoop:
Login as hduser after rebooting the system
$ sudo -i -u hduser
Go to folder where the .tat.gz hadoop file is present and run the following command:
$tar -zxvf hadoop-2.5.1.tar.gz
Above command will unzip the archive and create a folder hadoop-2.5.1
Move the folder to /usr/local as followes:
$ sudo mv hadoop-2.5.1 /usr/local/hadoop
Change the owner of all the files to the hduser user and hadoop group by using this command:
$ sudo chown -R hduser:hadoop /usr/local/hadoop

Configure Hadoop:
Have vim installed beforehand using
$ sudo apt-get install vim
Go to the directory where Hadoop has been installed:
$ cd /usr/local/hadoop

Following are the files we will be using to configure a Single-node hadoop cluster:
i. yarn-site.xml
ii. core-site.xml
iii. mapred-site.xml
iv. hdfs-site.xml
v. ~/.bashrc

We can find the list of files in Hadoop directory which is located in:
$ cd /usr/local/hadoop/etc/hadoop

Editing yarn-site.xml:
$ sudo vi yarn-site.xml
Place the following inbetween <configuration></configuration>:
<!– Site specific YARN configuration properties –>

Editing core-site.xml:
$ sudo core-site.xml
Place the following inbetween <configuration></configuration>:

Editing mapred-site.xml:
By default, mapred-site.xml is not present. Instead mapred.-site.xml.template is present. Rename the file as follows:
$ sudo cp mapred-site.xml.template mapred-site.xml
$ sudo vi mapred-site.xml
Now, paste the code below into the file, save and quit:

Editing hdfs-site.xml:
Before we change the hdfs-site.xml file run the following commands:
$ sudo mkdir -p /usr/local/hadoop/yarn_data/hdfs/namenode
$ sudo mkdir -p /usr/local/hadoop/yarn_data/hdfs/datanode
$ sudo vi hdfs-site.xml
Add the following to the file, save and quit

Now, go back to the root using:
$ sudo su


Checking Java Installation

Before editing the .bashrc file in our home directory, we need to find the path where Java has been installed to set the JAVA_HOME environment variable using the following command:
$ update-alternatives –config java
Output should be something similar to the image ‘Checking Java Installation’.

From here we know that out JAVA_HOME should be ‘/usr/lib/jvm/java-7-openjdk-amd64’
Modify the “export JAVA_HOME” line in “/usr/local/hadoop/etc/hadoop/” as follows:
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

Update the ~/.bashrc file as follows:
$ sudo vi ~/.bashrc
Add the following lines at the end of the file:
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_INSTALL/lib”

Formatting, starting and stopping Hadoop filesystem:
$ sudo -i -u hduser
$ cd /usr/local/hadoop/bin

The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your cluster. You need to do this the first time you set up a Hadoop cluster. Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS). To format the filesystem (which simply initializes the directory specified by the variable), run the following command:

$ hadoop namenode -format


Output if formatting HDFS is a success

This command should return something like the image above, if ‘namenode’ is properly formatted:

Start Hadoop Daemons by running the following commands:
For namenode:
$ start namenode
For datanode:
$ start datanode
For Resource Manager:
$ start resourcemanager
For node manager:
$ start nodemanager
For Job History Server
$ start historyserver

Each of the above commands should return something like:


Start Hadoop

We can check if it’s really up and running using the following command:
$ jps
This should return something like below, if Hadoop is up and running:


Checking if Hadoop is up and running properly

Stop Hadoop by running the following commands from /usr/local/hadoop/sbin/ folder:

Hadoop can also be started using the following commands from /usr/local/hadoop/sbin/ foler:

Hadoop Web Interface:
Hadoop comes with several web interfaces which are by default  available at these locations:
HDFS Namenode and check health using http://localhost:50070
HDFS Secondary Namenode status using http://localhost:50090

If you have any MapReduce jobs, you can test on this Single-Node Hadoop cluster.