Hadoop Cluster – Part 3 (Installation and Configuration)

In this post installation of Hadoop cluster is detailed. Assumption is Linux is preinstalled and configured for usage. If not follow below posts to create Linux Machines on VMs.

Commands to be executed in  “Blue Color”. Example yum install wget

Pre-requisites:

Package Management Utilities:

  • Update yum (Yellowdog Updater modified)
    • yum update
  • Before installing /  updating wget, check if it installed by running command
    • yum search wget

If installed, running above command should result in below output. As shown, wget is a utility to download packages from web (http or ftp).

image 

  • Update / install wget
    • yum install wget
    • yum update wget

Note:

If SSH is not installed, install SSHS to use putty to connect to Linux Machine.

  • yum -y install openssh-server openssh-clients

Note:

JAVA:

If installed, running above command should result in below output. As shown, wget is a utility to download packages from web (http or ftp).

  • Search if java is installed
    • yum search java |grep ‘java-‘
  • Install java using yum package management utility
    • yum install java-1.7.0-openjdk-devel
  • Check if Java is installed by running command
    • java –version

Installation of CDH 5.4.1:

Reference:

Managed deployment using Cloudera Manager:

Installation of Hadoop could be done in 2 major ways, Managed Services (using Cloudera Manager) and Unmanaged Services (Manual installation). Cloudera Manager simplifies deployment, configuration and operational aspects of Hadoop along with centralized monitoring, diagnosing / troubleshooting issues with Hadoop.

Managed deployment contains a Cloudera Manager (Centrally located) with agents installed on cluster hosts. Using Cloudera Manager, software can be installed / deployed or pushed to cluster hosts through agents. Useful in cases where CDH and related component need to be deployed in multiple machines. Additionally Cloudera manager makes it simple to build a PoC servers with all required components installed by default.

Unmanaged manual deployment:

Manual method is more useful to understand what the heck is this magic Cloudera manager is doing. So, in this post we will work only manual installation of CDH 5.4.1 on Centos. May be in future posts will work on Cloudera Manager for installation / deployment. Even with Unmanaged (manual) deployment there are multiple options

  • Download and install CDH “1-Click” package
  • Add CDH5 repository
  • Building own repository

Subsequent steps detail manual method of installation of CDH using “Building Own Repository”.

  • Download Cloudera’s latest CDH 5.4.1 using wget utility
    • wget archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo
  • Add repository to yum repository. yum repository is located at /etc/yum.repos.d
    • sudo cp cloudera-cdh5.repo /etc/yum.repos.d/
  • Open and view contents of cloudera repository
    • vi cloudera-cdh5.rep (Sample output below)

image

References for using repositories in CentOS. https://www.centos.org/docs/5/html/yum/sn-using-repositories.html

  • Install zookeeper daemon service
    • sudo yum install zookeeper-server
  • Start zookeeper service
    • If on fresh new installation, init zookeeper service using
      • sudo service zookeeper-server init –myid=1
    • start service
      • sudo service zookeeper-server start

Reference: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_zookeeper_package_install.html

  • Modify network configurations in all master and slave nodes as shown below
    • vi /etc/sysconfig/network
    • Add entry as below using “vi” or other editors.

image

    • Modify host entries
      • vi /etc/hosts

image

  • Reboot Servers and test network config changes are working by running
    • hostname
    • ping master
    • ping slave

  • Install Yarn Resource Manager.
    • cd /etc/yum.repos.d
    • sudo yum clean all
    • sudo yum install hadoop-yarn-resourcemanager
  • Install Name Node by running.
    • cd /etc/yum.repos.d
    • sudo yum clean all
    • sudo yum install hadoop-hdfs-namenode
    • sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
  • Install Secondary Name Node by running.
    • cd /etc/yum.repos.d
    • sudo yum clean all
    • sudo yum install hadoop-hdfs-secondarynamenode
    • sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
  • Install Data Node by running. Services to install (datanode, mapreduce, yarn resource manager)
    • cd /etc/yum.repos.d
    • sudo yum clean all
    • sudo yum install hadoop-hdfs-datanode
    • sudo yum install hadoop-0.20-mapreduce-tasktracker
    • sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce

Reference: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_cdh5_install.html#topic_4_4_4_unique_2

This concludes installation of Hadoop on Centos. Name Node, Data Node, Secondary Name Node, Yarn Resource Manager and Map Reduce are installed.  In next post we will detail configuring and starting hadoop cluster services

–Abhyast

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s