Share on your Social Media

Big Data Hadoop Tutorial

Published On: July 4, 2024

Big Data Hadoop Tutorial

A software system called Hadoop is made to handle big data. This Hadoop tutorial covers both fundamental and advanced concepts for both novices and experts.

Download Big Data Hadoop Tutorial PDF

Introduction to Big Data Hadoop

Big Data: Big Data are extremely large-scale data sets. Typically, we work with data that is MB (Word Doc, Excel) or up to GB (movies, codes); big data is defined as data that is in petabytes.

Big Data Hadoop: Hadoop is an Apache open-source platform used for processing and analyzing massive volumes of data. Facebook, Yahoo, Google, Twitter, LinkedIn, and numerous other companies use it.

Modules of Hadoop

HDFS: Hadoop Distributed File System. It says that blocks of files will be divided up and kept in nodes throughout the distributed architecture.

Yarn: Yet Another Resource Negotiator. It is used for managing the cluster and scheduling jobs.

Map Reduce: It is a framework that aids Java programs in leveraging key-value pairs to compute data in parallel. The ‘Map’ process converts data input into a collection of data that can be computed in key-value pairs.

Hadoop Common: Other Hadoop modules use these Java libraries, which are also used to launch Hadoop.

Big Data Hadoop Syllabus PDF

Hadoop Architecture

The MapReduce engine, the Hadoop Distributed File System (HDFS), and the file system comprise the Hadoop architecture.

A Hadoop cluster is made up of several slave nodes and one master node.

Master Nodes: Job Tracker, Task Tracker, NameNode, and DataNode.

Slave Nodes: TaskTracker and DataNode.

Hadoop Distributed File System

HDFS has a master/slave architecture This design consists of several DataNodes acting as slaves and a single NameNode acting as the master.

NameNode

The HDFS cluster consists of a single master server. Being a single node, it could lead to a single point of failure.

It streamlines the system’s architecture. It does this by opening, renaming, and shutting files to manage the file system namespace.

DataNode

There are several DataNodes in the HDFS cluster. Multiple data blocks are present in every DataNode. The purpose of these data blocks is data storage.

DataNode is in charge of reading and writing requests from clients of the file system. On the NameNode’s instruction, it creates, deletes, and replicates blocks.

Job Tracker

Accepting MapReduce jobs from clients and using NameNode to process the data is Job Tracker’s responsibility. As a result, NameNode gives Job Tracker metadata.

Task Tracker

It functions as a Job Tracker slave node. It applies the code to the file after receiving the task and code from Job Tracker. Another name for this procedure is a mapper.

Map Reduce Layer

The MapReduce is generated when the client application sends the MapReduce job to Job Tracker. In response, the job tracker forwards the request to the appropriate task trackers.

The TaskTracker occasionally times out or fails. That portion of the work is rescheduled in such a scenario.

Hadoop Installation

To install Hadoop from a ‘tar ball’ in a UNIX environment, you require the following:

Java Installation
SSH installation
Hadoop Installation and File Configuration

Java Installation

Step 1: Get Java at

http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

if it’s not already installed. On your computer, the tar file jdk-7u71-linux-x64.tar.gz will be downloaded.

Step 2: Use the command below to extract the file. #tar zxf jdk-7u71-linux-x64.tar.gz

Step 3: Move the file to /usr/local and configure the path to enable Java for all UNIX users.

To relocate the JDK to /usr/lib, switch to the root user at the prompt and enter the following command.

# mv jdk1.7.0_71 /usr/lib/

To configure the path, add the following instructions to the ~/.bashrc file.

# export JAVA_HOME=/usr/lib/jdk1.7.0_71

# export PATH=PATH:$JAVA_HOME/bin

Now that you have typed “java -version” into the prompt, you may verify the installation.

SSH Installation

Passwords are not requested while interacting with the master and slave computers over SSH. Make a Hadoop user on the master and slave systems first.

# useradd hadoop

# passwd Hadoop

To map the nodes, open the host file located in each machine’s /etc/ folder and provide the hostname and IP address.

# vi /etc/hosts

Fill in the lines below.

190.12.1.114 hadoop-master

190.12.1.121 hadoop-salve-one

190.12.1.143 hadoop-slave-two

Configure each node with an SSH key so that it may communicate with the others without a password. Instructions for the same are:

# su hadoop

$ ssh-keygen -t rsa

$ ssh-copy-id -i ~/.ssh/id_rsa.pub tutorialspoint@hadoop-master

$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp1@hadoop-slave-1

$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp2@hadoop-slave-2

$ chmod 0600 ~/.ssh/authorized_keys

$ exit

Hadoop Installation

Download links for Hadoop are available at

http://developer.yahoo.com/hadoop/tutorial/module3.html

Extract the Hadoop now, and move it to a different location.

$ mkdir /usr/hadoop

$ sudo tar vxzf hadoop-2.2.0.tar.gz ?c /usr/hadoop

Modify who owns the Hadoop folder.

$sudo chown -R hadoop usr/hadoop

Modify the configuration files for Hadoop:

There are all the files in /usr/local/Hadoop/etc/hadoop.

Step 1: In hadoop-env.sh file add

export JAVA_HOME=/usr/lib/jvm/jdk/jdk1.7.0_71

Step 2: Add the following in core-site.xml in between the configuration tabs:

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://hadoop-master:9000</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

</configuration>

Step 3: After switching between the configuration tabs on hdfs-site.xmladd,

<configuration>

<property>

<name>dfs.data.dir</name>

<value>usr/hadoop/dfs/name/data</value>

<final>true</final>

</property>

<property>

<name>dfs.name.dir</name>

<value>usr/hadoop/dfs/name</value>

<final>true</final>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

Step 4: Make the necessary changes to Mapred-site.xml as indicated below.

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>hadoop-master:9001</value>

</property>

</configuration>

Step 5: Lastly, make updates to $HOME/.bahsrc.

cd $HOME

vi .bashrc

Append following lines in the end and save and exit

#Hadoop variables

export JAVA_HOME=/usr/lib/jvm/jdk/jdk1.7.0_71

export HADOOP_INSTALL=/usr/hadoop

export PATH=$PATH:$HADOOP_INSTALL/bin

export PATH=$PATH:$HADOOP_INSTALL/sbin

export HADOOP_MAPRED_HOME=$HADOOP_INSTALL

export HADOOP_COMMON_HOME=$HADOOP_INSTALL

export HADOOP_HDFS_HOME=$HADOOP_INSTALL

export YARN_HOME=$HADOOP_INSTALL

Use the following command to install Hadoop on the slave system.

# su hadoop

$ cd /opt/hadoop

$ scp -r hadoop hadoop-slave-one:/usr/hadoop

$ scp -r hadoop hadoop-slave-two:/usr/Hadoop

Set up the slave and master nodes.

$ vi etc/hadoop/masters

hadoop-master

$ vi etc/hadoop/slaves

hadoop-slave-one

hadoop-slave-two

Following this pattern, launch every deamon and name the node.

# su hadoop

$ cd /usr/hadoop

$ bin/hadoop namenode -format

$ cd $HADOOP_HOME/sbin

$ start-all.sh

Big Data Hadoop Interview Questions

HDFS Basic File Operations

Step 1: Transferring data from the local file system to HDFS

Create an HDFS folder first so that data from the local file system can be stored there.

$ hadoop fs -mkdir /user/test

Copy the file “data.txt” from a file stored in the local folder /usr/home/Desktop to the HDFS folder /user/test

$ hadoop fs -copyFromLocal /usr/home/Desktop/data.txt /user/test

Show the contents of the HDFS folder with the command

$ Hadoop fs -ls /user/test

Step 2: Transfer data from HDFS to the local file system with the command

$ hadoop fs -copyToLocal /user/test/data.txt /usr/bin/data_copy.txt

Step 3: Verify if the files are identical by comparing them.

$ md5 /usr/bin/data_copy.txt /usr/home/Desktop/data.txt

Recursive Deleting

hadoop fs -rmr <arg>

Example: hadoop fs -rmr /user/sonoo/

HDFC Other Commands

The commands make use of the following.

“<path>” denotes the name of any file or directory.
“<path>…” denotes a file or directory name or names.
“<file>” can refer to any filename.
In a directed operation, the path designations are “<dest>” and “<src>”.
“<localSrc>” and “<localDest>” are paths on the local file system, similar to those above.

put <localSrc><dest>: It copies the file or directory from the local file system, denoted with localSrc, to dest in the DFS.

copyFromLocal <localSrc><dest>: Similar to -put

moveFromLocal <localSrc><dest>: It copies the file or directory to dest in HDFS from the local file system that localSrc has identified, then, upon success, deletes the local copy.

get [-crc] <src><localDest>: The file or directory is moved locally from HDFS, denoted with src, to the local file system path, denoted as localDest.

cat <filen-ame>: It shows the contents of the filename on the standard output.

moveToLocal <src><localDest>: Similar to -get, except it removes the HDFS copy upon success.

setrep [-R] [-w] rep <path>: It sets the file names indicated by the path to the rep’s target replication factor. (Over time, the replication factor itself will approach the target.)

touchz <path>: It creates a file at the path with a timestamp of the present moment. fails if there is already a file in the path unless the file has a zero size.

test -[ezd] <path>: Returns 0 otherwise, 1 if the path is a directory, has zero length, or both.

stat [format] <path>: It prints the path information. File size in blocks (%b), filename (%n), block size (%o), replication (%r), and modification date (%y, %Y) are all accepted in the format, which is a string.

Big Data Hadoop Training

Conclusion

We cover the fundamentals of Hadoop technology in this Big Data Hadoop Tutorial. We hope this is useful for you to get started with big data analytics. Learn comprehensively with our Big Data Hadoop training in Chennai.

Share on your Social Media

Big Data Hadoop Training in OMR

Big Data Hadoop Training in Chennai

Big Data Hadoop Challenges and Solutions

Published On: October 30, 2024

Introduction When anyone can generate enormous volumes of data in a matter of seconds, the…

Big data Hadoop Project Ideas

Published On: October 22, 2024

Big data Hadoop project ideas are a fantastic way to dive into the world of…

MERN Stack Tutorial for Web Development Aspirants

Published On: October 14, 2024

MERN Stack Tutorial for Web Development Aspirants There is a growing need for competent MERN…

Tableau Developer Salary in Chennai

Published On: October 12, 2024

Introduction A Tableau Developer designs, develops, and maintains dashboards and visualizations using Tableau software. Key…

Easy way to IT Job

DevOps Tools

DOTNET

JAVA

Share on your Social Media

Big Data Hadoop Tutorial

Big Data Hadoop Tutorial

Introduction to Big Data Hadoop

Modules of Hadoop

Hadoop Architecture

Hadoop Distributed File System

NameNode

DataNode

Job Tracker

Task Tracker

Map Reduce Layer

Hadoop Installation

Java Installation

SSH Installation

Hadoop Installation

HDFS Basic File Operations

Step 1: Transferring data from the local file system to HDFS

Step 2: Transfer data from HDFS to the local file system with the command

Step 3: Verify if the files are identical by comparing them.

HDFC Other Commands

Conclusion

Share on your Social Media

Featured Articles

Want to know more about becoming an expert in IT?

100% Placement
Assurance

Get IBM Certified

Related Courses at SLA

Big Data Hadoop Training in OMR

Big Data Hadoop Training in Chennai

Related Posts

Big Data Hadoop Challenges and Solutions

Big data Hadoop Project Ideas

MERN Stack Tutorial for Web Development Aspirants

Tableau Developer Salary in Chennai

Job Seeker Courses

Data Science & Visualization Courses

Artificial Intelligence COurses

Cloud Computing & DevOps Courses

DevOps Tools

Database Courses

Digital Marketing Courses

IT Infrastructure Management Courses

Mobile App Development Courses

Programming Courses

DOTNET

JAVA

Robotic Process Automation (RPA) Courses

Software Testing Courses

Web Development Courses

Other Training Courses

Share on your Social Media

Big Data Hadoop Tutorial

Big Data Hadoop Tutorial

Introduction to Big Data Hadoop

Modules of Hadoop

Hadoop Architecture

Hadoop Distributed File System

NameNode

DataNode

Job Tracker

Task Tracker

Map Reduce Layer

Hadoop Installation

Java Installation

SSH Installation

Hadoop Installation

HDFS Basic File Operations

Step 1: Transferring data from the local file system to HDFS

Step 2: Transfer data from HDFS to the local file system with the command

Step 3: Verify if the files are identical by comparing them.

HDFC Other Commands

Conclusion

Share on your Social Media

Featured Articles

Want to know more about becoming an expert in IT?

100% PlacementAssurance

Get IBM Certified

Related Courses at SLA

Big Data Hadoop Training in OMR

Big Data Hadoop Training in Chennai

Related Posts

Big Data Hadoop Challenges and Solutions

Big data Hadoop Project Ideas

MERN Stack Tutorial for Web Development Aspirants

Tableau Developer Salary in Chennai

Get Your Instant Job & Placement Eligibility Report in Just 30 Seconds!

We are excited to get started with you

100% Placement
Assurance

Get Your Instant Job & Placement Eligibility
Report in Just 30 Seconds!