Python / Hadoop
Python
Basic Introduction: Introduction to Python,History,Features,Installation,command interpreter and development environment-IDLE, Application of Python, Python 2/3 differences, Basic program structure-quotation and indentation,Operator,Basic data types and In-built objects
Function and Sequence : Functions: definition and use, Arguments, Block structure, scope, Recursion, Advanced Argument passing, Conditionals and Boolean expressions, Sequences: Strings, Tuples, Lists, Iteration, looping and control flow, String methods and formatting
OOPS concepts : Object Oriented concepts-Encapsulation, Polymorphism, Classes, Class instances, Constructors & Destructors init, del, Multiple inheritance, Operator overloading Properties, Special methods, Emulating built-in types
Mutable data types, Exception and Standard modules, Dictionaries, Sets and Mutability, Files and Text Processing, Exceptions, List and Dict Comprehensions, Lambda, Functions as Objects, Standard Modules-math, random Packages
Advanced Methods, Modules and Packages, Iterators and Generators – special metods __iter__() and __next__(),Decorators, Closures, Property, Context Managers,Context Decorator, Regular expression-re.match, re.search, re.findall
Hadoop
Introduction to Big Data Hadoop:- (Big Data Introduction, Hadoop Introduction. What is Hadoop? Why Hadoop? Hadoop History? Different types of Components in Hadoop Ecosystem? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE, OOZIE, Flume, Zookeeper and so on, What is the scope of Hadoop?)
Deep Dive in HDFS (Storing Data):- (Introduction of HDFS, HDFS Design, HDFS role in Hadoop, Features of HDFS, Daemons of Hadoop and its functionality, Name Node, Secondary Name Node, Job Tracker, Data Node, Task Tracker, Anatomy of File Write, Anatomy of File Read, Network Topology, Nodes, Racks, Data Center, Basic Configuration for HDFS, Data Organization. Blocks and Replication, Rack Awareness, Heartbeat Signal, How to Store the Data into HDFS, How to Read the Data from HDFS, Accessing HDFS (Introduction of Basic UNIX commands), CLI commands (Hadoop FS shell))
Hadoop Installing/Setup: (Downloading and installing the Ubuntu14.x, Installing Java, Installing Hadoop Standalone mode, Pseudo Distributed, Fully Distributed, Installation of Hortonworks/Cloudera package, Monitoring the Cluster Health(Ambari), Starting and Stopping the Nodes)
MapReduce using Java (Processing):- (The introduction of MapReduce, MapReduce Architecture, Data flow in MapReduce Splits, Mapper, Portioning, Sort and shuffle, Combiner Reducer, Understand Difference, Between Block and InputSplit, Role of RecordReader, Basic Configuration of MapReduce, MapReduce life cycle Driver Code, Mapper, Reducer, How MapReduce Works, Writing and Executing the Basic MapReduce Program using Java, Submission & Initialization of MapReduce Job, File Input/Output Formats in MapReduce Jobs, Text Input Format, Key Value Input Format, Sequence File Input Format Joins, Map-side Joins, Reducer-side Joins, Word Count Example, ToolRunner, Debugging, Performance Fine tuning, Partition MapReduce Program, Side Data Distribution, Distributed Cache (with Program), Counters (with Program), Types of Counter, Task Counters, Job Counters, User Defined Counters, Propagation of Counters)
Job Scheduling: PIG:- (Introduction to Apache PIG, Introduction to PIG Data Flow Engine, MapReduce vs. PIG in detail, When should PIG use? Data Types in PIG, Basic PIG programming, Modes of Execution in PIG, Local Mode and MapReduce Mode, Execution Mechanisms, Grunt Shell, Script, Embedded, Operators/Transformations in PIG, PIG UDF’s with Program, Word Count Example in PIG, Difference between the MapReduce and PIG)