9781947752061 Flipbook PDF


94 downloads 108 Views 1MB Size

Recommend Stories


Porque. PDF Created with deskpdf PDF Writer - Trial ::
Porque tu hogar empieza desde adentro. www.avilainteriores.com PDF Created with deskPDF PDF Writer - Trial :: http://www.docudesk.com Avila Interi

EMPRESAS HEADHUNTERS CHILE PDF
Get Instant Access to eBook Empresas Headhunters Chile PDF at Our Huge Library EMPRESAS HEADHUNTERS CHILE PDF ==> Download: EMPRESAS HEADHUNTERS CHIL

Story Transcript

PRASHANT NAIR

Notion Press Old No. 38, New No. 6 McNichols Road, Chetpet Chennai - 600 031 First Published by Notion Press 2017 Copyright ©  Prashant Nair 2017 All Rights Reserved. ISBN 978-1-947752-06-1 This book has been published with all reasonable efforts taken to make the material error-free after the consent of the author. No part of this book shall be used, reproduced in any manner whatsoever without written permission from the author, except in the case of brief quotations embodied in critical articles and reviews. The Author of this book is solely responsible and liable for its content including but not limited to the views, representations, descriptions, statements, information, opinions and references [“Content”]. The Content of this book shall not constitute or be construed or deemed to reflect the opinion or expression of the Publisher or Editor. Neither the Publisher nor Editor endorse or approve the Content of this book or guarantee the reliability, accuracy or completeness of the Content published herein and do not make any representations or warranties of any kind, express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose. The Publisher and Editor shall not be liable whatsoever for any errors, omissions, whether such errors or omissions result from negligence, accident, or any other cause or claims for loss or damages of any kind, including without limitation, indirect or consequential loss or damage arising out of use, inability to use, or about the reliability, accuracy or sufficiency of the information contained in this book.

Dedication To my parents, my teachers, my students and my wife!

Contents Prefaceix What This Book Covers? Lab Exercises Covered in This Book 1. Introducing Bigdata & Hadoop

xi xiii 1

2. Apache Hadoop Installation and Deployment

12

3. Demystifying HDFS

40

4. Understanding YARN and Schedulers

72

5. HDFS Federation and Upgrade

96

6. Apache Zookeeper Admin Basics

112

7. High Availability in Apache Hadoop

129

8. Apache Hive Admin Basics

195

9. Apache HBase Admin Basics

215

10. Data Acquisition using Apache Sqoop

248

11. Apache Oozie

270

12. Introducing Pig, Spark and Flume

292

Preface Bigdata is a blooming term in market. Wherever we go, we see Bigdata, whenever we do a Job Search, we see Bigdata openings. Bigdata is a term to define the capability of your software/hardware/framework/architecture to handle the data. If your existing architecture fails to load, process or store the data, we can say that you are facing a Bigdata problem. To solve Bigdata problems in a cost-efficient manner, solution architects started adopting Hadoop. There are many reasons why Hadoop is famous. Its cost-efficient, best for batch processing, capability is proportional to your hardware setup, integration of multiple processing tools and techniques and many more. This book is focused on basic to intermediate concepts and hands-on exercises on Hadoop and its ecosystem components with respect to administering and managing a typical Hadoop cluster. This book is designed for Linux admins, windows admins, technical managers, DBAs, technical and solution architects to say a few. This book begins with Introducing to Hadoop following up by Install, configure and manage the cluster. We will also cover the common administration tasks usually done by Hadoop Engineers. We will also cover Hadoop HDFS High Availability using QJM, Zookeeper in depth, Job Schedulers, Workflow management using Oozie and many more. So, whether you want to deep-dive Hadoop or want to understand the bit and pieces of Hadoop, this book is for you.

What This Book Covers? Chapter 1, Introducing Bigdata and Hadoop, introduces you with the world of Bigdata and explores the roles and responsibilities of a Hadoop administrator Chapter 2, Apache Hadoop Installation and Deployment, deep dive right from building hadoop-2.8.0 to installing and configuring Hadoop in standalone, pseudo-distributed and distributed mode. Chapter 3, Demystifying HDFS, talks in detail about HDFS storage, how it operates and teaches how to access HDFS layer using CLI and NFS gateway. It also covers some of the common administration tasks associated with HDFS. Chapter 4, Understanding YARN and Schedulers, helps reader understanding YARN internals and best practices while setting up YARN in production cluster. It also talks about implementing schedulers like Capacity and Fair Schedulers. Chapter 5, HDFS Federation and Upgrade, helps the reader to understand concerns of HDFS architecture in terms of multi-tenancy which is addressed using Federation. It also talks about how to implement HDFS Federation in the cluster and how to perform HDFS cluster upgrade from Gen1 to Gen2 and performing rolling upgrade. Chapter 6, Apache Zookeeper Admin Basics, deals with understanding and implementing Zookeeper in standalone and leader-follower mode. We also discuss how to use zookeeper CLI to see the content in Zookeeper filesystem.

What This Book Covers?

Chapter 7, High Availability in Apache Hadoop, deals with understanding how to overcome the single point of failure of Namenode system. We will be implementing High Availability using QJM. We will also build and implement HA on a federated cluster. Chapter 8, Apache Hive Admin Basics, talks about how hive works, installing hive with MySQL metastore and using Hiveserver2 with beeline client and securing the same. Chapter 9, Apache HBase Admin Basics, deals with understanding HBase architecture, installing and configuring HBase with single node, single master, and multi-master setup. After that we will learn some basic HBase shell commands and admin commands. Lastly, we will learn how to integrate HBase with Hive and how to perform bulk uploading data in HBase. Chapter 10, Data Acquisition using Apache Sqoop, talks about how to do data transfer between RDBMS and Hadoop. We will be learning CLI commands with some variations to deal with Sqoop’s import and export operation. Chapter 11, Apache Oozie, deals with how to create and schedule workflow in Oozie. We will be building, installing and configuring Oozie. Once done we will learn how to create a simple workflow. Chapter 12, Installing Apache Pig, Spark and Flume, deals with installation and configuration of Apache Pig, Apache Spark and Apache Flume.

xii

Lab Exercises Covered in This Book Chapter No.

Lab No.

Description

2

1

Building Apache Hadoop 2.8.0 from Scratch

2

2

Setting up Apache Hadoop 2.8.0 in Standalone Mode (CLI-Minicluster Mode)

2

3

Setting up Apache Hadoop 2.8.0 in PseudoDistributed Mode (Single Node Cluster)

2

4

Setting up Apache Hadoop 2.8.0 in Distributed Mode (Multinode Cluster)

3

5

Working with HDFS Filesystem Shell Commands

3

6

Setting up Replication Factor of an Existing Cluster

3

7

Dynamically Setting up Replication Factor During Specific File Upload

3

8

Setting ip Block Size in Existing Hadoop Cluster

3

9

Adding Nodes in an Existing Hadoop Cluster Without Cluster Downtime

3

10

Decommissioning Datanode in Existing Hadoop Cluster Without Data Loss and Downtime in the Cluster.

3

11

Whitelisting Datanodes in an Hadoop Cluster.

3

12

Working with Safemode (Maintenance Mode) in Hadoop

3

13

Checkpointing Metadata Manually

3

14

Setting up NFS Gateway to Access HDFS

Lab Exercises Covered in This Book

3

15

Setting up Datanode Heartbeat Interval

3

16

Setting up File Quota in HDFS

3

17

Removing File Quota in HDFS

3

18

Setting up Space Quota in HDFS

3

19

Removing Space Quota in HDFS

3

20

Configuring Trash Interval in HDFS and Recovering Data from Trash

4

21

Creating Multiple Users and Groups in Ubuntu System

4

22

Setting up Capacity Scheduler in YARN

4

23

Setting up Fair Scheduler in YARN

5

24

Setting up HDFS Federation in a 4 Node Cluster

5

25

Implementing ViewFS in Existing 4 Node Federated Cluster

5

26

Performing Hadoop Upgrade from Gen1(1.2.1) to Gen2(2.7.3)

6

27

Setting up Zookeeper in Standalone Mode

6

28

Setting up Zookeeper in Leader-Follower Mode

6

29

Running Basic Commands in Zookeeper CLI

7

30

Installing and Configuring 4-Node HDFS HA-Enabled Fresh Cluster

7

31

Configuring YARN ResourceManager HA in the 4 Node HDFS HA-Enabled Cluster

7

32

Configuring HDFS and YARN ResourceManager HA in an Existing NonHA Enabled Cluster Without Any Data Loss.

7

33

Building a Federated HA-Enabled Cluster

34

Performing Rolling Upgrade from Hadoop-2.7.3 to Hadoop-2.8.0 in an Existing 4 Node HDFS and YARN RM HA-Enabled Cluster

7

xiv

Lab Exercises Covered in This Book

8

35

Setting up Apache Hive with MySQL Database as a Metastore Server.

8

36

Connecting Beeline Client to hiveserver2

8

37

Configuring hiveserver2 to Secure Beeline Client Access

8

38

Configuring Hive Credential Store

9

39

Installing and Configuring Apache HBase in Single Node Cluster

9

40

Installing and Configuring Apache HBase Single HMaster Multinode Cluster

9

41

Installing and Configuring Apache HBase Multiple HMaster for HA in a Multinode Cluster

9

42

Working with HBase Shell

9

43

Performing Hive-HBase Integration for Data Interaction

9

44

Bulk Loading the Data in HBase Using Apache Hive

9

45

Bulk Loading Delimited File Directly in HBase

10

46

Installing and Configuring Apache Sqoop 1.4.6

10

47

Listing the Databases in MySQL Using Apache Sqoop

10

48

Listing the Tables in a Database in MySQL Using Apache Sqoop

10

49

Generating DAO for a Table Using Apache Sqoop

10

50

Perform Sqoop Eval for Select Query

10

51

Perform Sqoop Eval for Insert Query

10

52

Importing a Table from Database Having Primary Key Column Using Sqoop

10

53

Importing a Table from Database Without a Primary Key Column and Specifying a Destination Location in HDFS Using Sqoop

xv

BEGINNING

APACHE

HADOOP

ADMINISTRATION

B

igdata is one of the most demanding markets in the IT sector. If you are an administrator or a have a passion for knowing the internal configurations of Hadoop, then this book is for you. This book enables a professional to learn about Hadoop in terms of installation, configuration, and management. This book will help the reader to jumpstart with Hadoop frameworks, its eco-system components and slowly progress towards learning the administration part of Hadoop. The level of this book goes from beginner to intermediate with 70% hands-on exercises. Some of the techniques that you will learn include, • • • • • • • • • • • •

Installation and configuration of Hadoop cluster Performing Hadoop Cluster Upgrade Understanding and implementing HDFS Federation Understanding and Implementing High Availability Implementing HA on a Federated Cluster Zookeeper CLI Apache Hive Installation and Security HBase Multi-master setup Oozie installation, configuration and job submission Setting up HDFS Quotas Setting up HDFS NFS gateway Understanding and implementing rolling upgrade and much more.

Prashant Nair, founder of CognitoIT Consulting Pvt Ltd, developed a keen interest towards IT technologies at the age of nineteen, which led him to pursue his passion, as a career. His organization provides training and consultancy on the niche technologies like Bigdata, Cloud, Virtualization and DevOps tools. Presently, Prashant is an established corporate trainer and Bigdata consultant having an experience of more than twelve years in the fields of Datacenter and cluster implementations, cloud computing, Bigdata, DevOps, and Virtualization. He has also worked in the Bigdata domain as a Solution Architect and Hadoop consultant. He has trained lakhs of professionals in Bigdata, Cloud and DevOps tools. He also enjoys writing technical blogs on his website https://bigdataclassmumbai.com. You can connect with him on LinkedIn at https://in.linkedin.com/in/prashant-solution-architect

Price 399

Get in touch

Social

© Copyright 2013 - 2024 MYDOKUMENT.COM - All rights reserved.