Course Topics
Introduction
- Reviewing the Available Big Data Documentation, Tutorials, and Other Resources
- Course Road Map
- Course Objectives
- Starting the Oracle BDLite VM and accessing the Practice Files
- Questions About You
- Oracle Big Data Lite (BDLite) Virtual Machine (VM) Home Page
Introducing Oracle Big Data Strategy
- Big Data implementation examples
- Importance of Big Data
- Oracle strategy for Big Data: combining Big Data Processing Engines: Hadoop / NoSQL / RDBMS
- Characteristics of Big Data
- Big Data Opportunities: Some Examples
- Big Data Challenges
Using Oracle Big Data Lite Virtual Machine and Movieplex Application
- Reviewing the Deployment Guide
- Oracle Big Data Lite VM Home Page Sections
- Introducing the Oracle Movieplex Case Study
- Oracle Big Data Lite VM Used in this Course
- Importing the Appliance File
- Downloading and Running 7-zip Files to create Virtual Box Appliance File
- Downloading and installing Oracle VM VirtualBox and its Extension Pack
- Staring the Big Data Lite VM and Starting and Stopping Services
Introduction to the Big Data Ecosystem
- Cloudera’s Distribution Including Apache Hadoop (CDH)
- Apache Hadoop
- Types of Analysis That Use Hadoop
- CDH Architecture and Components
- Apache Hadoop Ecosystem
- Computer Clusters and Distributed Computing
- Types of Data Generated
- Apache Hadoop Core Components: HDFS, MapReduce (MR1), and YARN (MR2)
Introduction to the Hadoop Distributed File System
- Sample Hadoop High Availability (HA) Cluster
- HDFS Files and Blocks
- Hadoop Distributed Filesystem (HDFS) Design Principles, Characteristics, and Key Definitions
- Interacting With Data Stored in HDFS: Hue, Hadoop Client, WebHDFS, and HttpFS
- DataNodes (DN) Daemons Functions
- Writing a File to HDFS: Example
- Active and Standby Daemons (Services) Functions
Acquire Data using CLI, Fuse, Flume, and Kafka
- Kafka topics
- Additional Resources
- Viewing File System Contents Using the CLI
- What is Flume?
- Overview of FuseDFS
- Loading Data Using the CLI
- Reviewing the Command Line Interface (CLI)
- FS Shell Commands
Acquire and Access Data Using Oracle NoSQL Database
- Oracle NoSQL models: Key-Value and Table
- Accessing the KVStore
- What is a NoSQL Database
- Accessing the CLIs (Data, Admin, SQL)
- Acquiring and Accessing Data in a NoSQL DB
- HDFS Compared to NoSQL
- Define Oracle NoSQL Database
- RDBMS Compared to NoSQL
Introduction to MapReduce and YARN Processing Frameworks
- Data Locality Optimization in Hadoop
- Parallel Processing with MapReduce
- YARN Architecture, Features, and Daemons
- Hadoop Basic Cluster: MapReduce 1 Versus YARN (MR 2)
- MapReduce Framework Features, Benefits, and Jobs
- YARN Application Workflow
- Word Count Examples
- Submitting and Monitoring a MapReduce Job
Resource Management Using Yarn
- Static Service Pools
- Cloudera Manager Dynamic Resource Management: Example
- Working with the Fair Scheduler
- Cloudera Manager Resource Management Features
- First In, First Out (FIFO) Scheduler, Capacity Scheduler, and Fair Scheduler
- Submitting and Monitoring a MapReduce Job Using YARN
- Job Scheduling in YARN
- Using the YARN application Command
Overview of Apache Spark
- Benefits of Using Spark
- Running a Spark Application on YARN (yarn-cluster Mode)
- Spark Interactive Shells: spark-shell and pyspark
- Spark Application Components: Driver, Master, Cluster Manager, and Executors
- Monitoring Spark Jobs Using YARN’s ResourceManager Web UI
- Word Count Example by Using Interactive Scala
- Spark Architecture
- Resilient Distributed Dataset (RDD)
Overview of Apache Hive
- What is Hive?
- How is Data Stored in HDFS?
- Big Data SQL on Top of Hive Data
- Organizing and Describing Data With Hive
- Defining Tables Over HDFS
- Use Case: Storing Clickstream Data
- Hive Queries
- Hadoop Architecture
Overview of Cloudera Impala
- Hadoop: Some Data Access/Processing Options
- Cloudera Impala: Programming Interfaces
- How Impala Works with Hive
- Cloudera Impala
- How Impala Fits Into the Hadoop Ecosystem
- Overview of Cloudera Impala
- Cloudera Impala: Supported Data Formats
- Cloudera Impala: Key Features
Using Oracle XQuery for Hadoop
- XQuery Transformation and Basic Filtering
- XML Review
- Viewing the Completed Query in YARN’s ResourceManager
- Running an OXH Query
- OXH Features
- Oracle XQuery for Hadoop (OXH)
- Using OXH: Installation, Functions, Adapters, and Configuration Properties
- OXH Data Flow
- Overview of Solr
- Cloudera Search: Features
Overview of Solr
- Apache Solr (Cloudera Search)
- Cloudera Search Tasks
- Indexing in Cloudera Search
- Types of Indexing
- The solrctl Command
- Cloudera Search: Key Capabilities
Integrating Your Big Data
- Comparing Big Data Processing Engines
- Unifying Data: A Typical Requirement
- Introducing Data Unification Options
- When To Use These Options?
Batch Loading Options
- Oracle Copy to Hadoop
- Oracle Loader for Hadoop
- Apache Sqoop
Using Oracle SQL Connector for HDFS
- Using OSCH
- Performance Tuning
- Loading: Choosing a Connector
- Parallelism and Performance
- Batch and Dynamic Loading: Oracle SQL Connector for HDFS
- OSCH Architecture
- Features
- Key Benefits
Using Oracle Data Integrator and Oracle GoldenGate for Big Data
- Oracle GoldenGate for Big Data
- ODI’s Declarative Design
- Using ODI with Big Data Heterogeneous Integration with Hadoop Environments
- Using ODI Studio
- ODI Studio: Big Data Knowledge Modules
- ETL and Synchronization: Oracle Data Integrator
- ODI Knowledge Modules (KMs)Simpler Physical Design / Shorter Implementation Time
- ODI Studio Components: Overview
Using Oracle Big Data SQL
- Query Performance Overview
- Benefits: Virtualizes data access across Oracle Database, Hadoop and NoSQL stores
- Overcoming Big Data Barriers
- Barriers to Effective Big Data Adoption
- Oracle Big Data SQL: The Hybrid Solution
- Deployment Options
- Using Oracle Big Data SQL
Using Oracle Big Data Spatial and Graph
- BDSG: Graph Analysis
- Multimedia Analytics Framework
- Deployment Options for Oracle BDSG
- Oracle BDSG: Spatial Analysis
- Graph and Spatial Analysis: All About Relationships
- Additional Resources
- Strategy (supported platforms, etc)
- What is Oracle Big Data Spatial and Graph (BDSG)?
Using Oracle Advanced Analytics
- OAA: Oracle Data Mining
- OAA: Oracle R Enterprise
- Oracle Advanced Analytics (OAA)
Oracle Big Data Deployment Options
- BDA Hardware and Integrated and Optional Software
- Introduction to the Oracle Big Data Cloud Service – Compute Edition
- Running the Oracle BDA Configuration Generation Utility
- Administering and Securing the Oracle BDA
- Introduction to the Oracle Big Data Appliance
- Oracle BDA Mammoth Software Deployment Bundle
- Introduction to the Oracle Big Data Cloud Service
- Using the Oracle BDA mammoth Utility