How to setup Hadoop 2.9 Pseudo Cluster mode on a remote PC using SSH

In my <other tutorial>  we learned about what Hadoop is, why Hadoop is so awesome and what Hadoop is used for. No I will show you, how to setup Hadoop 2.9 in Pseudo Cluster mode on a VM using SSH.

Download Hadoop 2.9

wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.9.0/hadoop-2.9.0-src.tar.gz

Then unzip it
tar -xvzf hadoop-2.9.0-src.tar.gz

Remember where you extracted this to, because we will need to add the path to the Enviroment Variables later!
To get the path use the handy command
pwd

Download SSH and Rsync
sudo apt-get install ssh
sudo apt-get install rsync

Setup SSH connecton to localhost
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod og-wx ~/.ssh/authorized_keys

Setup Hadoop Enviroment Variables

sudo gedit ~/.bashrc

and enter the following text (and by that adding the following variables)
export HADOOP_HOME=/path/to/hadoop/folder
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Next step is to edit the Hadoop-env.sh file located inside of your Hadoop folder in /etc/hadoop/Hadoop-env.sh .
We will add your Java home path to the Hadoop settings.
Change
export JAVA_HOME=${JAVA_HOME}
for
export JAVA_HOME= /usr/lib/jvm/java-8-openjdk-amd64
To make sure you use the right path, write
echo $JAVA_HOME
in your Terminal, to recieve the Java Home Path

Enable Pseudo Cluster Mode

Now we can finally setup the configurations for Hadoop pseudo distributed mode
The necessary files to edit are located inside of the HadoopBase/etc/hadoop folder.

hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>1</value></property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/user/hadoop/data/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/user/hadoop/data/hdfs/datanode</value>
</property>

mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

core-site.xml

<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>

 

 

Then Format the File system

bin/hdfs namenode -format

and we are done!

To see how to run Hadoop check this article out!

Leave a Reply

Your email address will not be published. Required fields are marked *