Oracle Big Data Fundamentals Ed 2

Course code

OBDF

Duration

5 Days, 40 Acad. Hours

Download pdf

Course Overview

Objectives

Prerequisites

Course Outline

Course Overview

Course overview

In the Oracle Big Data Fundamentals course, you learn about big data, the technologies used in processing big data and Oracle’s solution to handle big data. You also learn to use Oracle Big Data Appliance to process big data, and obtain a hands-on experience in using Oracle Big Data Lite VM. You identify how to acquire the raw data from a variety of sources, and learn to use HDFS and Oracle NoSQL Database to store the data. You learn about data integration options available in Oracle Big Data. These include Oracle Big Data Connectors to move data to and from Oracle Database, Oracle Data Integrator and Oracle GoldenGate for Big Data which provide integration and synchronization capabilities for data unification of relational and Hadoop data, and Oracle Big Data SQL, which enables dynamic, integrated access for all of your data big data, whether it is stored in HDFS, NoSQL, or Oracle Database. Finally, you learn how to analyze your big data using Oracle Big Data SQL, Oracle Advance Analytics, and Oracle Big Data Spatial and Graph.

Objectives

Course Objectives

Define Big Data
Describe Oracle’s Integrated Big Data Solution and its components
Define Cloudera’s distribution of Hadoop and its core components and the Hadoop ecosystem
Use the Hadoop Distributed File System (HDFS)
Acquire big data using the Command Line Interface, Flume, and Oracle NoSQL Database
Process big data using MapReduce, YARN, Hive, Oracle XQuery for Hadoop, Solr, and Spark
Integrate big data and warehouse data using Sqoop, Oracle Big Data Connectors, Copy to Hadoop, Oracle Data Integrator, and Oracle GoldenGate for big data, and Oracle Big Data SQL
Analyze big data using Oracle Big Data SQL, Oracle Big Data Spatial and Graph, and Oracle Advanced Analytics technologies
Use and manage Oracle Big Data Appliance
Identify the key features and benefits of Oracle Big Data Cloud Service
Identify the key features and benefits of Oracle Big Data Cloud Service – Compute Edition

Prerequisites

Suggested Prerequisite

Database Basics and Administration
Exposure to Big Data

Audience

Application Developers
Database Administrators
Database Developers

Course Outline

Course Topics

Introduction

Reviewing the Available Big Data Documentation, Tutorials, and Other Resources
Course Road Map
Course Objectives
Starting the Oracle BDLite VM and accessing the Practice Files
Questions About You
Oracle Big Data Lite (BDLite) Virtual Machine (VM) Home Page

Introducing Oracle Big Data Strategy

Big Data implementation examples
Importance of Big Data
Oracle strategy for Big Data: combining Big Data Processing Engines: Hadoop / NoSQL / RDBMS
Characteristics of Big Data
Big Data Opportunities: Some Examples
Big Data Challenges

Using Oracle Big Data Lite Virtual Machine and Movieplex Application

Reviewing the Deployment Guide
Oracle Big Data Lite VM Home Page Sections
Introducing the Oracle Movieplex Case Study
Oracle Big Data Lite VM Used in this Course
Importing the Appliance File
Downloading and Running 7-zip Files to create Virtual Box Appliance File
Downloading and installing Oracle VM VirtualBox and its Extension Pack
Staring the Big Data Lite VM and Starting and Stopping Services

Introduction to the Big Data Ecosystem

Cloudera’s Distribution Including Apache Hadoop (CDH)
Apache Hadoop
Types of Analysis That Use Hadoop
CDH Architecture and Components
Apache Hadoop Ecosystem
Computer Clusters and Distributed Computing
Types of Data Generated
Apache Hadoop Core Components: HDFS, MapReduce (MR1), and YARN (MR2)

Introduction to the Hadoop Distributed File System

Sample Hadoop High Availability (HA) Cluster
HDFS Files and Blocks
Hadoop Distributed Filesystem (HDFS) Design Principles, Characteristics, and Key Definitions
Interacting With Data Stored in HDFS: Hue, Hadoop Client, WebHDFS, and HttpFS
DataNodes (DN) Daemons Functions
Writing a File to HDFS: Example
Active and Standby Daemons (Services) Functions

Acquire Data using CLI, Fuse, Flume, and Kafka

Kafka topics
Additional Resources
Viewing File System Contents Using the CLI
What is Flume?
Overview of FuseDFS
Loading Data Using the CLI
Reviewing the Command Line Interface (CLI)
FS Shell Commands

Acquire and Access Data Using Oracle NoSQL Database

Oracle NoSQL models: Key-Value and Table
Accessing the KVStore
What is a NoSQL Database
Accessing the CLIs (Data, Admin, SQL)
Acquiring and Accessing Data in a NoSQL DB
HDFS Compared to NoSQL
Define Oracle NoSQL Database
RDBMS Compared to NoSQL

Introduction to MapReduce and YARN Processing Frameworks

Data Locality Optimization in Hadoop
Parallel Processing with MapReduce
YARN Architecture, Features, and Daemons
Hadoop Basic Cluster: MapReduce 1 Versus YARN (MR 2)
MapReduce Framework Features, Benefits, and Jobs
YARN Application Workflow
Word Count Examples
Submitting and Monitoring a MapReduce Job

Resource Management Using Yarn

Static Service Pools
Cloudera Manager Dynamic Resource Management: Example
Working with the Fair Scheduler
Cloudera Manager Resource Management Features
First In, First Out (FIFO) Scheduler, Capacity Scheduler, and Fair Scheduler
Submitting and Monitoring a MapReduce Job Using YARN
Job Scheduling in YARN
Using the YARN application Command

Overview of Apache Spark

Benefits of Using Spark
Running a Spark Application on YARN (yarn-cluster Mode)
Spark Interactive Shells: spark-shell and pyspark
Spark Application Components: Driver, Master, Cluster Manager, and Executors
Monitoring Spark Jobs Using YARN’s ResourceManager Web UI
Word Count Example by Using Interactive Scala
Spark Architecture
Resilient Distributed Dataset (RDD)

Overview of Apache Hive

What is Hive?
How is Data Stored in HDFS?
Big Data SQL on Top of Hive Data
Organizing and Describing Data With Hive
Defining Tables Over HDFS
Use Case: Storing Clickstream Data
Hive Queries
Hadoop Architecture

Overview of Cloudera Impala

Hadoop: Some Data Access/Processing Options
Cloudera Impala: Programming Interfaces
How Impala Works with Hive
Cloudera Impala
How Impala Fits Into the Hadoop Ecosystem
Overview of Cloudera Impala
Cloudera Impala: Supported Data Formats
Cloudera Impala: Key Features

Using Oracle XQuery for Hadoop

XQuery Transformation and Basic Filtering
XML Review
Viewing the Completed Query in YARN’s ResourceManager
Running an OXH Query
OXH Features
Oracle XQuery for Hadoop (OXH)
Using OXH: Installation, Functions, Adapters, and Configuration Properties
OXH Data Flow
Overview of Solr
Cloudera Search: Features

Overview of Solr

Apache Solr (Cloudera Search)
Cloudera Search Tasks
Indexing in Cloudera Search
Types of Indexing
The solrctl Command
Cloudera Search: Key Capabilities

Integrating Your Big Data

Comparing Big Data Processing Engines
Unifying Data: A Typical Requirement
Introducing Data Unification Options
When To Use These Options?

Batch Loading Options

Oracle Copy to Hadoop
Oracle Loader for Hadoop
Apache Sqoop

Using Oracle SQL Connector for HDFS

Using OSCH
Performance Tuning
Loading: Choosing a Connector
Parallelism and Performance
Batch and Dynamic Loading: Oracle SQL Connector for HDFS
OSCH Architecture
Features
Key Benefits

Using Oracle Data Integrator and Oracle GoldenGate for Big Data

Oracle GoldenGate for Big Data
ODI’s Declarative Design
Using ODI with Big Data Heterogeneous Integration with Hadoop Environments
Using ODI Studio
ODI Studio: Big Data Knowledge Modules
ETL and Synchronization: Oracle Data Integrator
ODI Knowledge Modules (KMs)Simpler Physical Design / Shorter Implementation Time
ODI Studio Components: Overview

Using Oracle Big Data SQL

Query Performance Overview
Benefits: Virtualizes data access across Oracle Database, Hadoop and NoSQL stores
Overcoming Big Data Barriers
Barriers to Effective Big Data Adoption
Oracle Big Data SQL: The Hybrid Solution
Deployment Options
Using Oracle Big Data SQL

Using Oracle Big Data Spatial and Graph

BDSG: Graph Analysis
Multimedia Analytics Framework
Deployment Options for Oracle BDSG
Oracle BDSG: Spatial Analysis
Graph and Spatial Analysis: All About Relationships
Additional Resources
Strategy (supported platforms, etc)
What is Oracle Big Data Spatial and Graph (BDSG)?

Using Oracle Advanced Analytics

OAA: Oracle Data Mining
OAA: Oracle R Enterprise
Oracle Advanced Analytics (OAA)

Oracle Big Data Deployment Options

BDA Hardware and Integrated and Optional Software
Introduction to the Oracle Big Data Cloud Service – Compute Edition
Running the Oracle BDA Configuration Generation Utility
Administering and Securing the Oracle BDA
Introduction to the Oracle Big Data Appliance
Oracle BDA Mammoth Software Deployment Bundle
Introduction to the Oracle Big Data Cloud Service
Using the Oracle BDA mammoth Utility

Back to trainings (Oracle)

Request the training

Oracle Big Data Fundamentals Ed 2

Course code:

OBDF

Duration:

5 Days, 40 Acad. Hours

Apply

Download pdf