Hadoop Cluster – Part 3 (Installation and Configuration)

In this post installation of Hadoop cluster is detailed. Assumption is Linux is preinstalled and configured for usage. If not follow below posts to create Linux Machines on VMs.

Commands to be executed in  “Blue Color”. Example yum install wget


Package Management Utilities:

  • Update yum (Yellowdog Updater modified)
    • yum update
  • Before installing /  updating wget, check if it installed by running command
    • yum search wget

If installed, running above command should result in below output. As shown, wget is a utility to download packages from web (http or ftp).


  • Update / install wget
    • yum install wget
    • yum update wget


If SSH is not installed, install SSHS to use putty to connect to Linux Machine.

  • yum -y install openssh-server openssh-clients



If installed, running above command should result in below output. As shown, wget is a utility to download packages from web (http or ftp).

  • Search if java is installed
    • yum search java |grep ‘java-‘
  • Install java using yum package management utility
    • yum install java-1.7.0-openjdk-devel
  • Check if Java is installed by running command
    • java –version

Installation of CDH 5.4.1:


Managed deployment using Cloudera Manager:

Installation of Hadoop could be done in 2 major ways, Managed Services (using Cloudera Manager) and Unmanaged Services (Manual installation). Cloudera Manager simplifies deployment, configuration and operational aspects of Hadoop along with centralized monitoring, diagnosing / troubleshooting issues with Hadoop.

Managed deployment contains a Cloudera Manager (Centrally located) with agents installed on cluster hosts. Using Cloudera Manager, software can be installed / deployed or pushed to cluster hosts through agents. Useful in cases where CDH and related component need to be deployed in multiple machines. Additionally Cloudera manager makes it simple to build a PoC servers with all required components installed by default.

Unmanaged manual deployment:

Manual method is more useful to understand what the heck is this magic Cloudera manager is doing. So, in this post we will work only manual installation of CDH 5.4.1 on Centos. May be in future posts will work on Cloudera Manager for installation / deployment. Even with Unmanaged (manual) deployment there are multiple options

  • Download and install CDH “1-Click” package
  • Add CDH5 repository
  • Building own repository

Subsequent steps detail manual method of installation of CDH using “Building Own Repository”.

  • Download Cloudera’s latest CDH 5.4.1 using wget utility
    • wget archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo
  • Add repository to yum repository. yum repository is located at /etc/yum.repos.d
    • sudo cp cloudera-cdh5.repo /etc/yum.repos.d/
  • Open and view contents of cloudera repository
    • vi cloudera-cdh5.rep (Sample output below)


References for using repositories in CentOS. https://www.centos.org/docs/5/html/yum/sn-using-repositories.html

  • Install zookeeper daemon service
    • sudo yum install zookeeper-server
  • Start zookeeper service
    • If on fresh new installation, init zookeeper service using
      • sudo service zookeeper-server init –myid=1
    • start service
      • sudo service zookeeper-server start

Reference: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_zookeeper_package_install.html

  • Modify network configurations in all master and slave nodes as shown below
    • vi /etc/sysconfig/network
    • Add entry as below using “vi” or other editors.


    • Modify host entries
      • vi /etc/hosts


  • Reboot Servers and test network config changes are working by running
    • hostname
    • ping master
    • ping slave

  • Install Yarn Resource Manager.
    • cd /etc/yum.repos.d
    • sudo yum clean all
    • sudo yum install hadoop-yarn-resourcemanager
  • Install Name Node by running.
    • cd /etc/yum.repos.d
    • sudo yum clean all
    • sudo yum install hadoop-hdfs-namenode
    • sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
  • Install Secondary Name Node by running.
    • cd /etc/yum.repos.d
    • sudo yum clean all
    • sudo yum install hadoop-hdfs-secondarynamenode
    • sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
  • Install Data Node by running. Services to install (datanode, mapreduce, yarn resource manager)
    • cd /etc/yum.repos.d
    • sudo yum clean all
    • sudo yum install hadoop-hdfs-datanode
    • sudo yum install hadoop-0.20-mapreduce-tasktracker
    • sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce

Reference: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_cdh5_install.html#topic_4_4_4_unique_2

This concludes installation of Hadoop on Centos. Name Node, Data Node, Secondary Name Node, Yarn Resource Manager and Map Reduce are installed.  In next post we will detail configuring and starting hadoop cluster services



Redis Cluster Setup (Linux on Windows)

There are so many things to share but so little time.. Though started writing about installation and setup of Hadoop on Windows using HyperV, deviating and writing about Redis. Have grand plans to write about Hadoop, Cassandra, Redis and SQL 14 along with my favorite Machine Learning. If only I could be better organized :(…

Coming to Redis, Redis recently got Cluster Support. Below from Redis release notes, Cluster feature was released on April 2015.



So, tried setting up Redis Cluster (different from Master / Slave concept) on Linux (Centos 7.0). Below is captured notes / lessons learnt. Follow previous posts to install / configure Linux on Windows machine using HyperV




Few points, 

  • Installation process below is comprehensive for setting up redis on Linux that is setup from base (thus lengthy)
  • Below process installs Redis as Service and not an simple executable. If Service is not needed skip running install_server.sh  step
  • If you want to avoid following below process Redis provides script to setup cluster.
    • Under “utils” folder there is “create-cluster” script. Running below commands would setup cluster with 3 Masters and 3 Slaves
      • create-cluster start
      • create-cluster create
  • If Ruby is already installed, skips steps for installation of Ruby.

Requirements to install Redis Cluster:

  • Connection to Internet to download setup files (Else download manually and copy for Offline installation)
  • Permissions to install


From here installation process starts.


  • Update YUM installer
    • yum update
    • yum install wget



  • Goto a folder where redis needs to be installed
    • cd /opt/
  • Download Redis Stable version
  • Unzip File
    • tar xvzf redis-stable.tar.gz
  • Goto unzipped directory
    • cd redis-stable
  • Compile code
    • make
    • make Install
  • To run Redis (Standalone or as a Service) below setup files are needed along with their locations.
  • To run & test redis below files are needed
    • redis.conf (/opt/redis-stable/redis.conf)
    • redis-benchmark (/opt/redis-stable/src/redis-benchmark)
    • redis-server (/opt/redis-stable/src/redis-server)
    • redis-check-aof (/opt/redis-stable/src/redis-check-aof)
    • redis-check-dump(/opt/redis-stable/src/redis-check-dump)
    • redis-cli (/opt/redis-stable/src/redis-cli)
  • To setup cluster multiple instances of Redis need to be running either on same or different machines.
  • On same machine: Each instance should have dedicated configuration file along with data and log directories. Create folder structure as needed. Sample below
  • Create a directory name redis-base
    • mkdir redis-base
  • Copy files from above folders to redis-base folder
    • cp /opt/redis-stable/redis.conf /opt/redis-base
    • cp /opt/redis-stable/src/redis-benchmark /opt/redis-base
    • cp /opt/redis-stable/src/redis-server /opt/redis-base
    • cp /opt/redis-stable/src/redis-check-aof /opt/redis-base
    • cp /opt/redis-stable/src/redis-check-dump /opt/redis-base
    • cp /opt/redis-stable/src/redis-cli /opt/redis-base
  • Based on number of instances of redis to be run, create as many folders.
    • mkdir redis_6379
    • mkdir redis_6380
    • mkdir redis_6381
    • mkdir redis_6382
    • mkdir redis_6383
  • Copy files from Redis-base to each of these folders
    • cp /opt/redis-base/* /opt/redis_6379
    • cp /opt/redis-base/* /opt/redis_6380
    • cp /opt/redis-base/* /opt/redis_6381
    • cp /opt/redis-base/* /opt/redis_6382
    • cp /opt/redis-base/* /opt/redis_6383
  • Starting each instance of redis. Below steps should be followed for each instance and port numbers (command port and connect port should be different for each instance of redis, if redis servers are running on same linux server).
    • cd /opt/redis_6379
    • cp redis.conf redis.conf.default
    • vi redis.conf
    • port <each instance to have its own port number>
      • Redis_6379 port: 6379
      • Redis_6380 port: 6380
      • Redis_6381 port: 6381
      • Redis_6382 port: 6382
      • Redis_6383 port: 6383
  • Scroll down to end where it has cluster configuration



  • Change below configurations:
    • Remove “#” before cluster-enabled yes. Should look like
    • cluster-enabled yes
    • cluster-config-file nodes-6379.conf
    • cluster-node-timeout 15000
    • cluster-slave-validity-factor 10
    • Press “ESC” button followed by “:” , “w” and Enter to quit vi editor
  • Install by running below command
    • ./opt/redis-stable/utils/install_server.sh
  • Running install_server.sh prompts for
    • Port Number 
    • Config file
    • Log Directory
    • Data Directory
    • Executable Directory
  • Run install_server.sh command as many times as Redis installation required.
  • Start Redis Service:
    • service redis_6379 start
  • If required to stop Redis Service:
    • service redis_6379 stop
  • To check running services, run chkconfig command
    • chkconfig –list
  • Connect to each instance of Redis and ensure it is running and cluster-enabled is set to 1
    • redis-cli –h –p 6379 <Port Number>
  • To get help from client, redis-cli –help
  • After connected it show IP and Port Number. At the prompt type command “info”


  • Cluster setup tool is written in RUBY. Ruby needs to be installed. Follow below steps to install Ruby and then configure cluster.
  • install all required packages for ruby installation
    • yum install gcc-c++ patch readline readline-devel zlib zlib-devel
    • yum install libyaml-devel libffi-devel openssl-devel make
    • yum install bzip2 autoconf automake libtool bison iconv-devel
  • Install Ruby Version Manager (RVM)
    • gpg –keyserver hkp://keys.gnupg.net –recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
    • curl -sSL https://get.rvm.io | bash -s stable
      • Optionally add any of these –rails or –ruby or –ruby = 1.9.3
  • Rvm provides shell script to setup RVM environment before installing Ruby
    • source /etc/profile.d/rvm.sh
  • After RVM is setup, install RUBY
    • rvm install 2.1.2
  • Use RVM command to setup default Ruby version
    • rvm use 2.1.2 –default
  • Check current ruby version
    • ruby –version
  • Ruby Script to configure Redis Cluster requires redis. Ruby script references redis. Prior to running Cluster setup script run
    • gem install redis
  • After ruby installation is complete run below command to setup Redis Cluster
    • /opt/redis-stable/src/redis-trib.rb create –replicas 0

This completes “Redis Cluster Installation with default options”.



CentOS installation steps – Part 2

Proceeding with installation of CentOS on Virtualized environment. Last step in previous post was to click on Connect to launch CentOS bootable image.

1. CentOS 7 bootable image gives option to

  • Install
  • Test
  • Troubleshoot

2. Click on Tabs to move to different options. Select “Install CentOS 7” and  hit enter.


3. Executes bunch of commands. Nothing much to do but wait till execution is complete..


4. After sequence of steps are complete, launches CENTOS 7 INSTALLATION SCREEN

5. Select necessary language and click “Continue” button at bottom right corner.


5. Click on each of options below and configure as required.

  • Date & Time: To configure date and time. Configure Local Time and configure same consistently across all VMs.
  • Keyboard: To configure Keyboard layout.
  • Language Support: As afore detailed, language to be supported by OS
  • Installation Source: Auto detects to ISO image if not can be manually configured.
  • Software Selection: Multiple options (With UI, DevTools and Minimal Install). Used Minimal Install as running on very low Memory.
    • Also noticed UI is not smooth (may be because of Virtual Machine instead of Physical Machine)



6. Clicking on Date & Time opens up map where Country and location can be selected.

7. Also Network Time Protocol (NTP, norm in enterprises) can be configured when connected to network.

8 Choose appropriate Time Zone and formatting details at bottom and click on “DONE” on left top corner below “Date & Time” to return to Setup Configuration screen


7. Clicking on “Installation Source ” is optional as ISO was mounted in earlier step.

8. But good to check options under, click “”Installation Source” to open below window.

9 Nothing much to change, review and click “Done”


10. Click on Software Selection to choose one among various options.

  • For GUI and prebuilt development tools, select GNOME desktop or KDE plasma workspaces
  • GUI can be enabled later as well if required

11. Here “Minimal Install” was selected as we want to manually install software using Linux Command line utilities.

12. After choosing type of  Software Installation, click Done to to move to main installation configuration screen


12. Click on “Installation Destination” to review options.

13. Leave default, post review and click “Done”



14. Click Network and Hostname to open configuration options.

15. If already connected to network and enabled to Virtual Machine Manager (on HOST OS), Guest OS (CentOS) gets IP address.


16. After all configurations are complete, there should be not be any warnings (Red or Orange color ! and text below each button)

17. Click “Begin Installation”, bottom right corner to start installation of CentOS.


18. Parallel to installation running behind the scenes, two tasks need to be completed.

  • Setting  up Root password
  • Creation of User


19. Click on each button and configure as required.

  • Complex Password requirement.
  • Additional admin user is what was created for this setup.


20. After installation is complete below screen is presented that prompts for “root” user login.


This completes Linux Installation (only Minimal Installation). Next post is installing preq-requisites software to run CloudEra Hadoop Installation (CDH5)

Until then..


CentOS installation steps on Windows using HyperVisor – Part 1


For building Hadoop Cluster on Cloudera Hadoop Cluster (CDH 5), first step is to install Linux. Using any of HyperV(Microsoft), VMWare Workstation (VMWare) , Virtual Box (Oracle) Linux installation can be done on a virtual server. This post details step by step installation guide for installing CentOS 7.0 using Hyper-V and Virtual Box.

Before starting installation download required ISO images from CentOS site. Click below link and download appropriate ISO image required.

http://isoredirect.centos.org/centos/7/isos/x86_64/  . For this walk through guide, CentOS-7.0-1406-x86_64-DVD.iso was used.


1. Open Hyper-V Manager on Windows Machine.

2. Right Click and select New -> Virtual Machine

3. A “New Virtual Machine Wizard launches”


  4. Click “Next” on New Virtual Machine Wizard to go to “Specify Name and Location” tab.


5. Enter Name of Virtual machine. HDPNameNode was entered below as I was building Hadoop Cluster.

6. Optionally, if alternative location is required, Select “Store the virtual machine in a different location” and enter path to store VM

7. Click Next to go to tab “Specify Generation”.


8. Selection “Generation 1”

Note: Need to check installation with “Generation 2” for Linux.

9. Click Next to move to “Assign Memory”


10. In “Assign Memory” tab, based on available memory on system and number of VMs that would run, assign memory to Guest OS.


10. In this section provide

  1. 1. Name of Virtual hard disk
  2. 2. Path where to store Virtual Hard disk.
  3. Size of Virtual Hard Disk (16 – 24 GB is good enough).


11. Select bootable image that was downloaded earlier.


12. Review Virtual Machine configuration and click finish to complete “New Virtual Machine” Wizard.


13. On Hyper V manager, new Virtual Machine “HDPNameNode” will be listed. Right click on Virtual Machine name and click on Connect.


This launches Linux bootable image. Next post will detail step by step installation of Linux (CentOS 7.0 version).