About Course
Objective
This course is designed to give you a comprehensive view on the world of Cloud Computing and Big Data. In this course we cover a multitude of technologies that comprise the modern concept
of cloud computing and Big Data Analytics. Cloud applications and data analytics represent a disruptive change in the ways that society is informed by, and uses information.
Eligibility
Candidates interested must have with prior knowledge in any programming language, Data Structures and Algorithms and SQL. This course is more suitable for freshers who seek for a fundamental understanding of Big Data.
Modules
Module 1
In this module, we will introduce the concept of cloud computing and the economical foundations that make cloud computing make sense. We then introduce some fundamental concepts including software defined architectures and cloud services. We end the module by introducing you to the low level cloud computing service offered, infrastructure as a service. Next, we introduce you to the world of Big Data applications. We start by introducing you to Apache Spark, a common framework used for many different tasks throughout the course. We then introduce some Big Data distro packages, the HDFS file system, and finally the idea of batch-based Big Data processing using the MapReduce programming paradigm.
Module 2
In this module, you will learn about large scale data storage technologies and frameworks. We start by exploring the challenges of storing large data in distributed systems. We then discuss in-
memory key/value storage systems, NoSQL distributed databases, and distributed publish/subscribe queues. Next, we cover virtualization and containers with deeper focus, including lectures on Docker, JVM and Kubernetes. We finish up week two by comparing the infrastructure as a service offering by the big three: Amazon, Google and Microsoft.
Module 3
In the third module, we introduce Metal as a Service (provision real hardware in the cloud), Platform as a Service (provide a platform to run user code on) and Web Middleware as the glue technology that empowers cloud computing. This module introduces you to real-time streaming systems, also known as Fast Data. We talk about Apache Storm in length, Apache Spark Streaming, and Lambda and Kappa architectures. Finally, we contrast all these technologies as a streaming ecosystem.
Module 4
In the last and final module of the course we focus on data storage in the clouds. In this module, we introduce big data and cloud file systems such as HDFS and Ceph, cloud object stores such
has Openstack Swift or Amazon S3, virtualized block storage devices such as Amazon EBS and archival storage options like the Amazon Glacier. Finally, we conclude the module with introducing the DropBox cloud API that enables developers to quickly integrate cloud storage options in their applications. Further, we discuss the applications of Big Data. In particular, we focus on two topics: graph processing, where massive graphs (such as the web graph) are processed for information, and machine learning, where massive amounts of data are used to train models such as clustering algorithms and frequent pattern mining. We also introduce you to deep learning, where large data sets are used to train neural networks with effective results.
Outcome
- In-depth knowledge of Cloud computing with Big Data
- Knowledge enhancement in cloud object stores such has Openstack Swift and Amazon S3.
- Applying tools like HDFS and Ceph