9781800560611 Flipbook PDF


57 downloads 102 Views 10MB Size

Recommend Stories


Porque. PDF Created with deskpdf PDF Writer - Trial ::
Porque tu hogar empieza desde adentro. www.avilainteriores.com PDF Created with deskPDF PDF Writer - Trial :: http://www.docudesk.com Avila Interi

EMPRESAS HEADHUNTERS CHILE PDF
Get Instant Access to eBook Empresas Headhunters Chile PDF at Our Huge Library EMPRESAS HEADHUNTERS CHILE PDF ==> Download: EMPRESAS HEADHUNTERS CHIL

Story Transcript

Snowflake Cookbook

Techniques for building modern cloud data warehousing solutions

Hamid Mahmood Qureshi | Hammad Sharif FOR SALE IN INDIA ONLY

Snowflake Cookbook Techniques for building modern cloud data warehousing solutions

Hamid Mahmood Qureshi Hammad Sharif

BIRMINGHAM—MUMBAI

Snowflake Cookbook Copyright © 2021 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Group Product Manager: Kunal Parikh Publishing Product Manager: Ali Abidi Commissioning Editor: Sunith Shetty Acquisition Editor: Ali Abidi Senior Editor: Roshan Kumar Content Development Editors: Athikho Rishana, Sean Lobo Technical Editor: Sonam Pandey Copy Editor: Safis Editing Project Coordinator: Aishwarya Mohan Proofreader: Safis Editing Indexer: Priyanka Dhadke Production Designer: Vijay Kamble First published: February 2021 Production reference: 1230221 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-80056-061-1

www.packt.com

To my father, whose authoring of countless books was an inspiration. To my mother, who dedicated her life to her children's education and well-being. – Hamid Qureshi

To my dad and mom for unlimited prayers and (according to my siblings, a bit extra) love. I cannot thank and appreciate you enough. To my wife and the mother of my children for her support and encouragement throughout this and other treks made by us. – Hammad Sharif

Contributors About the authors Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications. I want to thank the people who have helped me on this journey: my co-author Hammad, our technical reviewer, Hassaan, the Packt team, and my loving wife and children for their support throughout this journey. Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing. I would like to first and foremost thank my loving wife and children for their patience and encouragement throughout the long process of writing this book. I'd also like to thank Hamid for inviting me to be his partner in crime and for his patience, my publishing team for their guidance, and the reviewers for helping improve this work.

About the reviewers Hassaan Sajid has around 12 years of experience in data warehousing and business intelligence in the retail, telecommunications, banking, insurance, and government sectors. He has worked with various clients in Australia, UAE, Pakistan, Saudi Arabia, and the USA in multiple BI/data warehousing roles, including BI architect, as a BI developer, ETL developer, data modeler, operations analyst, data analyst, and technical trainer. He holds a master's degree in BI and is a professional Scrum Master. He is also certified in Snowflake, MicroStrategy, Tableau, Power BI, and Teradata. His hobbies include reading, traveling, and photography. Buvaneswaran Matheswaran has a bachelor's degree in electronics and communication engineering from the Government College of Technology, Coimbatore, India. He had the opportunity to work on Snowflake in its very early stages and has more than 4 years of Snowflake experience. He has done lots of work and research on Snowflake as an enterprise admin. He has worked mainly in retail- and Consumer Product Goods (CPG)-based Fortune 500 companies. He is immensely passionate about cloud technologies, data security, performance tuning, and cost optimization. This is the first time he has done a technical review for a book, and he enjoyed the experience immensely. He has learned a lot as a user and also shared his experience as a veteran Snowflake admin. Daan Bakboord is a self-employed data and analytics consultant from the Netherlands. His passion is collecting, processing, storing, and presenting data. He has a simple motto: a customer must be able to make decisions based on facts and within the right context. DaAnalytics is his personal (online) label. He provides data and analytics services, having been active in Oracle Analytics since the mid-2000s. Since the end of 2017, his primary focus has been in the area of cloud analytics. Focused on Snowflake and its ecosystem, he is Snowflake Core Pro certified and, thanks to his contributions to the community, has been recognized as a Snowflake Data Hero. Also, he is Managing Partner Data and Analytics at Pong, a professional services provider that focuses on data-related challenges.

Table of Contents Preface

1

Getting Started with Snowflake Technical requirements Creating a new Snowflake instance

2

Using SnowSQL to connect to Snowflake

11

2

Getting ready How to do it… How it works…

2 3 5

Getting ready How to do it… How it works… There's more…

11 11 14 14

Creating a tailored multi-cluster virtual warehouse

5

Connecting to Snowflake with JDBC

14

Getting ready How to do it… How it works… There's more…

5 6 6 7

Getting ready How to do it… How it works… There's more…

15 15 20 22

8

Creating a new account admin user and understanding built-in roles 22

Using the Snowflake WebUI and executing a query Getting ready How to do it… How it works…

8 8 10

How to do it… How it works… There's more…

23 23 24

Getting ready How to do it…

26 26

2

Managing the Data Life Cycle Technical requirements Managing a database

26 26

ii Table of Contents How it works… There's more…

28 29

There's more…

40

Managing a schema

29

Managing external tables and stages

40

Getting ready How to do it… How it works… There's more…

29 29 31 32

Getting ready How to do it… How it works… There's more…

40 41 44 44

Managing tables

33

Managing views in Snowflake

45

Getting ready How to do it… How it works…

34 34 39

Getting ready How to do it… How it works… There's more…

45 45 47 48

3

Loading and Extracting Data into and out of Snowflake Technical requirements Configuring Snowflake access to private S3 buckets

50

Snowflake

64

50

Getting ready How to do it… How it works…

50 51 56

Getting ready How to do it… How it works…

64 65 67

Making sense of JSON semi-structured data and transforming to a relational view

68

Getting ready How to do it… How it works…

68 69 72

Processing newline-delimited JSON (or NDJSON) into a Snowflake table

72

Getting ready How to do it… How it works…

72 73 75

Processing near real-time data into a Snowflake table using Snowpipe

75

Loading delimited bulk data into Snowflake from cloud storage

57

Getting ready How to do it… How it works…

57 57 59

Loading delimited bulk data into Snowflake from your local machine

61

Getting ready How to do it… How it works…

61 61 63

Loading Parquet files into

Table of Contents iii Getting ready How to do it… How it works…

76 76 80

Extracting data from Snowflake 80 Getting ready How to do it… How it works…

81 81 83

4

Building Data Pipelines in Snowflake Technical requirements Creating and scheduling a task

86 86

Getting ready How it works…

86 91

Conjugating pipelines through a task tree 91 Getting ready How to do it… How it works…

91 92 96

Querying and viewing the task history

96

Getting ready How to do it… How it works…

96 97 99

Exploring the concept of streams to capture table-level changes Getting ready How to do it…

100 100 100

How it works…

104

Combining the concept of streams and tasks to build pipelines that process changed data on a schedule 104 How to do it… How it works…

Converting data types and Snowflake's failure management How to do it… How it works… There's more…

Managing context using different utility functions Getting ready How to do it… How it works… There's more…

104 108

109 109 112 113

113 113 113 116 116

5

Data Protection and Security in Snowflake Technical requirements Setting up custom roles and completing the role hierarchy Getting ready How to do it…

118 118 118 118

How it works… There's more…

Configuring and assigning a default role to

121 121

iv Table of Contents

a user Getting ready How to do it… How it works… There's more…

122 122 122 124 125

Delineating user management from security and role management 125 Getting ready How to do it… How it works…

Configuring custom roles for managing access to highly secure data Getting ready

126 126 128

128 128

How to do it… How it works…

Setting up development, testing, pre-production, and production database hierarchies and roles Getting ready How to do it… How it works…

129 131

132 132 132 134

Safeguarding the ACCOUNTADMIN role and users in the ACCOUNTADMIN role 134 Getting ready How to do it… How it works…

135 135 143

6

Performance and Cost Optimization Technical requirements Examining table schemas and deriving an optimal structure for a table Getting ready How to do it… How it works…

Identifying query plans and bottlenecks Getting ready How to do it… How it works…

146

Identifying and reducing unnecessary Fail-safe and Time Travel storage usage 159

146

Getting ready How to do it… How it works…

146 146 149

149 149 150 154

Weeding out inefficient queries through analysis 155 Getting ready How to do it… How it works…

155 155 158

Projections in Snowflake for performance Getting ready How to do it… How it works… There's more…

Reviewing query plans to modify table clustering Getting ready How to do it… How it works…

159 159 163

163 163 163 167 168

168 169 169 173

Table of Contents v

Optimizing virtual warehouse scale

173

Getting ready How to do it… How it works…

173 174 181

How to do it… How it works…

197 200

7

Secure Data Sharing Technical requirements Sharing a table with another Snowflake account Getting ready How to do it… How it works…

Sharing data through a view with another Snowflake account Getting ready How to do it… How it works…

Sharing a complete database with another Snowflake account and setting up future objects to be shareable Getting ready

184 184 184 184 189

189 190 190 196

196 196

Creating reader accounts and configuring them for nonSnowflake sharing Getting ready How to do it… How it works… Getting ready How to do it… How it works…

Keeping costs in check when sharing data with nonSnowflake users Getting ready How to do it… How it works…

200 201 201 205 206 206 209

210 210 210 214

8

Back to the Future with Time Travel Technical requirements 216 Using Time Travel to return to the state of data at a particular time 216 Getting ready How to do it… How it works…

Using Time Travel to recover

216 216 219

from the accidental loss of table data Getting ready How to do it… How it works…

Identifying dropped databases, tables, and other objects and restoring them using Time

220 220 220 223

vi Table of Contents

Travel Getting ready How to do it… How it works…

Using Time Travel in conjunction with cloning to improve debugging Getting ready

223 223 223 228

228 228

How to do it… How it works…

Using cloning to set up new environments based on the production environment rapidly

228 232

233

Getting ready How to do it… How it works…

233 233 237

Getting ready How to do it… How it works…

255 255 261

9

Advanced SQL Techniques Technical requirements Managing timestamp data Getting ready How to do it… How it works…

240 240 240 240 244

Shredding date data to extract Calendar information 245 Getting ready How to do it… How it works…

Unique counts and Snowflake Getting ready How to do it… How it works…

Managing transactions in Snowflake

245 245 250

251 251 251 254

255

Ordered analytics over window frames 261 Getting ready How to do it… How it works…

Generating sequences in Snowflake Getting ready How to do it… How it works…

261 261 265

265 265 266 270

Table of Contents vii

10

Extending Snowflake Capabilities Technical requirements Creating a Scalar user-defined function using SQL Getting ready How to do it... How it works...

Creating a Table user-defined function using SQL Getting ready How to do it How it works

Creating a Scalar user-defined function using JavaScript Getting ready How to do it How it works

272 272 272 272 275

275 275 275 280

280 280 280 282

Other Books You May Enjoy Index

Creating a Table user-defined function using JavaScript Getting ready How to do it How it works

Connecting Snowflake with Apache Spark Getting ready How to do it How it works

282 283 283 287

288 288 288 292

Using Apache Spark to prepare data for storage on Snowflake 293 Getting ready How to do it How it works

Why subscribe?

293 293 298

299

Preface Understanding a technology for analytics is an important aspect before embarking on delivering data analytic solutions, particularly in the cloud. This book introduces Snowflake tools and techniques you can use to tame challenges associated with data management, warehousing, and analytics. The cloud provides a quick onboarding mechanism, but at the same time, for novice users who lack the knowledge to efficiently use Snowflake to build and maintain a data warehouse, using trial and error can lead to higher bills. This book provides a practical introduction and guidance for those who have used other technologies, either on-premise or in the cloud for analytics and data warehousing, and those who are keen on transferring their skills to the new technology. The book provides practical examples that are typically involved in data warehousing and analytics in a simple way supported by code examples. It takes you through the user interface and management console offered by Snowflake and how to get started by creating an account. It also takes you through examples of how to load data and how to deliver analytics using different Snowflake capabilities and touches on extending the capabilities of Snowflake using stored procedures and user-defined functions. The book also touches on integrating Snowflake with Java and Apache Spark to allow it to coexist with a data lake. By the end of this book, you will be able to build applications on Snowflake that can serve as the building blocks of a larger solution, alongside security, governance, the data life cycle, and the distribution of data on Snowflake.

Who this book is for The book acts as a reference for users who want to learn about Snowflake using a practical approach. The recipe-based approach allows the different personas in data management to pick and choose what they want to learn, as and when required. The recipes are independent and start by helping you to understand the environment. The recipes require basic SQL and data warehousing knowledge.

Snowflake Cookbook

Snowflake is a unique cloud-based data warehousing platform built from scratch to tackle data management on the cloud. This book introduces Snowflake’s unique architecture, which places it at the forefront of cloud data warehouses. We will explore the compute model available with Snowflake and how Snowflake allows extensive scaling through virtual warehouses. You will learn how to configure a virtual warehouse for optimizing cost and performance. You will explore the data ecosystem and discover how Snowflake integrates with other technologies for staging and loading data. As you progress through the chapters, you will leverage Snowflake’s capabilities to process a series of SQL statements using tasks to build data pipelines and find out how you can create modern data solutions and pipelines designed to provide high performance and scalability. You will also get to grips with creating role hierarchies, adding custom roles, and setting default roles for users before covering advanced topics such as data sharing, cloning, and performance optimization. By the end of this Snowflake book, you will be well-versed in Snowflake’s architecture for building modern analytical solutions and understand best practices for solving commonly faced problems using practical recipes.

Things you will learn: • •



Data warehousing techniques aligned with Snowflake’s cloud architecture Broad skills for data warehouse designers to cover Snowflake ecosystem and tooling Transfer skills from on-premise data warehousing to the Snowflake cloud analytics platform

• • • • •

FOR SALE IN INDIA ONLY

Optimize performance and costs associated with a Snowflake solution Stage data on object stores and load it into Snowflake Secure data and sharing it efficiently for access in a controlled manner Manage transactions and extend Snowflake using stored procedures Extend cloud data applications using Spark Connector

Get in touch

Social

© Copyright 2013 - 2024 MYDOKUMENT.COM - All rights reserved.