A Practical Introduction to Stream Processing Training Course
Stream Processing involves real-time processing of "data in motion," which means performing computations on data as it is being received. This type of data is read as continuous streams from various sources such as sensor events, website user activity, financial transactions, credit card swipes, and click streams. Stream Processing frameworks are designed to handle large volumes of incoming data and provide valuable insights almost instantly.
In this instructor-led, live training (either onsite or remote), participants will learn how to set up and integrate different Stream Processing frameworks with existing big data storage systems and related software applications and microservices.
By the end of this training, participants will be able to:
- Install and configure various Stream Processing frameworks, such as Spark Streaming and Kafka Streams.
- Understand and choose the most suitable framework for specific tasks.
- Process data continuously, concurrently, and on a record-by-record basis.
- Integrate Stream Processing solutions with existing databases, data warehouses, and data lakes.
- Select the most appropriate stream processing library to integrate with enterprise applications and microservices.
Audience
- Developers
- Software architects
Format of the Course
- A combination of lectures, discussions, exercises, and extensive hands-on practice.
Course Outline
Introduction
- Stream processing vs batch processing
- Analytics-focused stream processing
Overview Frameworks and Programming Languages
- Spark Streaming (Scala)
- Kafka Streaming (Java)
- Flink
- Storm
- Comparison of Features and Strengths of Each Framework
Overview of Data Sources
- Live data as a series of events over time
- Historical data sources
Deployment Options
- In the cloud (AWS, etc.)
- On premise (private cloud, etc.)
Getting Started
- Setting up the Development Environment
- Installing and Configuring
- Assessing Your Data Analysis Needs
Operating a Streaming Framework
- Integrating the Streaming Framework with Big Data Tools
- Event Stream Processing (ESP) vs Complex Event Processing (CEP)
- Transforming the Input Data
- Inspecting the Output Data
- Integrating the Stream Processing Framework with Existing Applications and Microservices
Troubleshooting
Summary and Conclusion
Requirements
- Programming experience in any language
- An understanding of Big Data concepts (Hadoop, etc.)
Need help picking the right course?
A Practical Introduction to Stream Processing Training Course - Enquiry
A Practical Introduction to Stream Processing - Consultancy Enquiry
Testimonials (1)
Sufficient hands on, trainer is knowledgable
Chris Tan
Course - A Practical Introduction to Stream Processing
Related Courses
Apache Kafka Connect
7 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at developers who wish to integrate Apache Kafka with existing databases and applications for processing, analysis, etc.
By the end of this training, participants will be able to:
- Use Kafka Connect to ingest large amounts of data from a database into Kafka topics.
- Ingest log data generated by an application servers into Kafka topics.
- Make any collected data available for stream processing.
- Export data from Kafka topics into secondary systems for storage and analysis.
Big Data Streaming for Developers
14 HoursLearn to implement end-to-end big data streaming use cases. Real-time data preparation and maintenance with Informatica, Edge, Kafka and Spark. This training covers software versions 10.2.1 and up.
Building Kafka Solutions with Confluent
14 HoursThis instructor-led, live training (online or onsite) is designed for engineers who want to utilize Confluent (a distribution of Kafka) to build and manage a real-time data processing platform for their applications.
By the end of this training, participants will be able to:
- Install and configure the Confluent Platform.
- Use Confluent's management tools and services to streamline Kafka operations.
- Store and process incoming stream data efficiently.
- Optimize and manage Kafka clusters effectively.
- Secure data streams with robust measures.
Course Format
- Interactive lecture and discussion.
- Extensive exercises and practice sessions.
- Hands-on implementation in a live-lab environment.
Customization Options for the Course
- This course is based on the open-source version of Confluent: Confluent Open Source.
- To request a customized training for this course, please contact us to arrange.
Building Data Pipelines with Apache Kafka
7 HoursApache Kafka is a distributed streaming platform that has become the de facto standard for building data pipelines. It addresses a wide range of data processing use cases, functioning as a message queue, distributed log, stream processor, and more.
We will begin by exploring the theoretical foundations of data pipelines in general, then delve into the core concepts underlying Kafka. Additionally, we will examine key components such as Kafka Streams and Kafka Connect.
Distributed Messaging with Apache Kafka
14 HoursThis course is designed for enterprise architects, developers, system administrators, and anyone interested in understanding and utilizing a high-throughput distributed messaging system. Should you have more specific requirements, such as focusing solely on the system administration aspect, the course can be customized to better meet your needs.
Kafka for Administrators
21 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at beginner-level / intermediate-level / advanced-level system administrators and operations engineers who wish to use Apache Kafka to deploy, secure, monitor, and troubleshoot Kafka clusters.
By the end of this training, participants will be able to: explain Kafka architecture and KRaft mode, operate and secure Kafka clusters, monitor performance and reliability, and resolve common production issues.
Apache Kafka for Developers
21 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at intermediate-level developers who wish to develop big data applications with Apache Kafka.
By the end of this training, participants will be able to:
- Develop Kafka producers and consumers to send and read data from Kafka.
- Integrate Kafka with external systems using Kafka Connect.
- Write streaming applications with Kafka Streams & ksqlDB.
- Integrate a Kafka client application with Confluent Cloud for cloud-based Kafka deployments.
- Gain practical experience through hands-on exercises and real-world use cases.
Apache Kafka for Python Programmers
7 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at data engineers, data scientists, and programmers who wish to use Apache Kafka features in data streaming with Python.
By the end of this training, participants will be able to use Apache Kafka to monitor and manage conditions in continuous data streams using Python programming.
Security for Apache Kafka
7 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at software testers who wish to implement network security measures into an Apache Kafka application.
By the end of this training, participants will be able to:
- Deploy Apache Kafka onto a cloud based server.
- Implement SSL encryption to prevent attacks.
- Add ACL authentication to track and control user access.
- Ensure credible clients have access to Kafka clusters with SSL and SASL authentication.
Stream Processing with Kafka Streams
7 HoursKafka Streams is a client-side library designed for building applications and microservices that process data within a Kafka messaging system. Traditionally, Apache Kafka has relied on external tools like Apache Spark or Apache Storm to handle data processing between producers and consumers. However, by integrating the Kafka Streams API into an application, data can be processed directly within Kafka, eliminating the need to send it to a separate cluster for processing.
In this instructor-led, live training, participants will learn how to integrate Kafka Streams into a set of sample Java applications that exchange data with Apache Kafka for stream processing.
By the end of this training, participants will be able to:
- Understand the features and advantages of Kafka Streams over other stream processing frameworks
- Process stream data directly within a Kafka cluster
- Develop a Java or Scala application or microservice that integrates with Kafka and Kafka Streams
- Write efficient code that transforms input Kafka topics into output Kafka topics
- Build, package, and deploy the application
Audience
- Developers
Format of the course
- Part lecture, part discussion, exercises, and extensive hands-on practice
Notes
- To request a customized training for this course, please contact us to arrange
Python and Spark for Big Data for Banking (PySpark)
14 HoursPython is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python.
Target Audience: Intermediate-level professionals in the banking industry familiar with Python and Spark, seeking to deepen their skills in big data processing and machine learning.
SMACK Stack for Data Science
14 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at data scientists who wish to use the SMACK stack to build data processing platforms for big data solutions.
By the end of this training, participants will be able to:
- Implement a data pipeline architecture for processing big data.
- Develop a cluster infrastructure with Apache Mesos and Docker.
- Analyze data with Spark and Scala.
- Manage unstructured data with Apache Cassandra.
Python and Spark for Big Data (PySpark)
21 HoursIn this instructor-led, live training in Uzbekistan, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.
By the end of this training, participants will be able to:
- Learn how to use Spark with Python to analyze Big Data.
- Work on exercises that mimic real world cases.
- Use different tools and techniques for big data analysis using PySpark.
Microservices with Spring Cloud and Kafka
21 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at developers who wish to transform traditional architecture into a highly concurrent microservices-based architecture using Spring Cloud, Kafka, Docker, Kubernetes and Redis.
By the end of this training, participants will be able to:
- Set up the necessary development environment for building microservices.
- Design and implement a highly concurrent microservices ecosystem using Spring Cloud, Kafka, Redis, Docker and Kubernetes.
- Transform monolithic and SOA services to microservice based architecture.
- Adopt a DevOps approach to developing, testing and releasing software.
- Ensure high concurrency among microservices in production.
- Monitor microservices and implement recovery strategies.
- Carry out performance tuning.
- Learn about future trends in microservices architecture.
Stratio: Rocket and Intelligence Modules with PySpark
14 HoursStratio is a data-centric platform that integrates big data, AI, and governance into a single solution. The Rocket and Intelligence modules of Stratio facilitate rapid data exploration, transformation, and advanced analytics in enterprise settings.
This instructor-led, live training (conducted online or on-site) is designed for intermediate-level data professionals who wish to effectively utilize the Rocket and Intelligence modules in Stratio with PySpark, focusing on looping structures, user-defined functions, and advanced data logic.
By the end of this training, participants will be able to:
- Navigate and operate within the Stratio platform using the Rocket and Intelligence modules.
- Apply PySpark for data ingestion, transformation, and analysis.
- Use loops and conditional logic to manage data workflows and feature engineering tasks.
- Create and manage user-defined functions (UDFs) for reusable data operations in PySpark.
Format of the Course
- Interactive lecture and discussion sessions.
- Numerous exercises and practical activities.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.