In my <other tutorial> we learned about what Hadoop is, why Hadoop is so awesome and what Hadoop is used for. No I will show you, how to setup Hadoop 2.9 in Pseudo Cluster mode on a VM using SSH.
Download Hadoop 2.9
wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.9.0/hadoop-2.9.0-src.tar.gz
Then unzip it
tar -xvzf hadoop-2.9.0-src.tar.gz
Remember where you extracted this to, because we will need to add the path to the Enviroment Variables later!
To get the path use the handy command
pwd
Download SSH and Rsync
sudo apt-get install ssh
sudo apt-get install rsync
Setup SSH connecton to localhost
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod og-wx ~/.ssh/authorized_keys
Setup Hadoop Enviroment Variables
sudo gedit ~/.bashrc
and enter the following text (and by that adding the following variables)
export HADOOP_HOME=/path/to/hadoop/folder
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Next step is to edit the Hadoop-env.sh file located inside of your Hadoop folder in /etc/hadoop/Hadoop-env.sh .
We will add your Java home path to the Hadoop settings.
Change
export JAVA_HOME=${JAVA_HOME}
for
export JAVA_HOME= /usr/lib/jvm/java-8-openjdk-amd64
To make sure you use the right path, write
echo $JAVA_HOME
in your Terminal, to recieve the Java Home Path
Enable Pseudo Cluster Mode
Now we can finally setup the configurations for Hadoop pseudo distributed mode
The necessary files to edit are located inside of the HadoopBase/etc/hadoop folder.
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value></property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/user/hadoop/data/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/user/hadoop/data/hdfs/datanode</value>
</property>
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
Then Format the File system
bin/hdfs namenode -format
and we are done!
To see how to run Hadoop check this article out!