SMACK Stack for Data Science Training Course
SMACK is a suite of data platform software that includes Apache Spark, Apache Mesos, Apache Akka, Apache Cassandra, and Apache Kafka. By using the SMACK stack, users can develop and scale robust data processing platforms.
This instructor-led, live training (available both online and onsite) is designed for data scientists who want to utilize the SMACK stack to build comprehensive data processing systems for big data solutions.
By the end of this training, participants will be able to:
- Implement a data pipeline architecture capable of handling large-scale data processing.
- Set up a cluster infrastructure using Apache Mesos and Docker.
- Perform data analysis with Spark and Scala.
- Manage unstructured data effectively with Apache Cassandra.
Format of the Course
- Interactive lectures and discussions.
- Extensive exercises and practical activities.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- For a customized training for this course, please contact us to arrange.
Course Outline
Introduction
SMACK Stack Overview
- What is Apache Spark? Apache Spark features
- What is Apache Mesos? Apache Mesos features
- What is Apache Akka? Apache Akka features
- What is Apache Cassandra? Apache Cassandra features
- What is Apache Kafka? Apache Kafka features
Scala Language
- Scala syntax and structure
- Scala control flow
Preparing the Development Environment
- Installing and configuring the SMACK stack
- Installing and configuring Docker
Apache Akka
- Using actors
Apache Cassandra
- Creating a database for read operations
- Working with backups and recovery
Connectors
- Creating a stream
- Building an Akka application
- Storing data with Cassandra
- Reviewing connectors
Apache Kafka
- Working with clusters
- Creating, publishing, and consuming messages
Apache Mesos
- Allocating resources
- Running clusters
- Working with Apache Aurora and Docker
- Running services and jobs
- Deploying Spark, Cassandra, and Kafka on Mesos
Apache Spark
- Managing data flows
- Working with RDDs and dataframes
- Performing data analysis
Troubleshooting
- Handling failure of services and errors
Summary and Conclusion
Requirements
- An understanding of data processing systems
Audience
- Data Scientists
Need help picking the right course?
SMACK Stack for Data Science Training Course - Enquiry
SMACK Stack for Data Science - Consultancy Enquiry
Testimonials (1)
very interactive...
Richard Langford
Course - SMACK Stack for Data Science
Related Courses
Anaconda Ecosystem for Data Scientists
14 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at data scientists who wish to use the Anaconda ecosystem to capture, manage, and deploy packages and data analysis workflows in a single platform.
By the end of this training, participants will be able to:
- Install and configure Anaconda components and libraries.
- Understand the core concepts, features, and benefits of Anaconda.
- Manage packages, environments, and channels using Anaconda Navigator.
- Use Conda, R, and Python packages for data science and machine learning.
- Get to know some practical use cases and techniques for managing multiple data environments.
A Practical Introduction to Data Science
35 HoursParticipants who complete this training will gain practical, real-world insights into Data Science and its related technologies, methodologies, and tools.
The participants will have the opportunity to apply this knowledge through hands-on exercises. Group interaction and instructor feedback are integral components of the course.
The course begins with an introduction to fundamental concepts in Data Science, then delves into the tools and methodologies used in the field.
Audience
- Developers
- Technical Analysts
- IT Consultants
Format of the Course
- A blend of lectures, discussions, exercises, and extensive hands-on practice
Note
- To request a customized training for this course, please contact us to arrange.
Data Science Programme
245 HoursThe unprecedented explosion of information and data in today’s world is driving innovation at an unparalleled pace. The role of a Data Scientist is one of the most sought-after skills across industries today.
We offer more than just theoretical learning; we provide practical, marketable skills that bridge the gap between academic knowledge and industry demands.
This 7-week curriculum can be customized to meet your specific industry requirements. For further information, please contact us or visit the Nobleprog Institute website.
Audience:
This program is designed for postgraduate-level individuals as well as anyone with the necessary prerequisite skills, which will be assessed through an evaluation and interview process.
Delivery:
The course delivery combines Instructor-Led Classroom sessions and Instructor-Led Online sessions. Typically, the first week is conducted in a classroom setting, weeks 2 to 6 are held in a virtual classroom, and the final week returns to a classroom environment.
Data Science for Big Data Analytics
35 HoursBig data refers to extremely large and intricate datasets that exceed the capabilities of conventional data processing applications. The challenges associated with big data encompass various aspects such as data capture, storage, analysis, searching, sharing, transferring, visualizing, querying, updating, and ensuring information privacy.
Data Science essential for Marketing/Sales professionals
21 HoursThis course is designed for Marketing and Sales Professionals who are looking to delve deeper into the application of data science in their respective fields. The course offers comprehensive coverage of various data science techniques used for upselling, cross-selling, market segmentation, branding, and customer lifetime value (CLV).
Difference Between Marketing and Sales - How do sales and marketing differ?
In simple terms, sales can be described as a process that focuses on individuals or small groups. On the other hand, marketing targets a broader audience or the general public. Marketing encompasses research (identifying customer needs), product development (creating innovative products), and promotion (through advertising) to raise awareness about the product among consumers. Essentially, marketing involves generating leads or prospects. Once the product is available in the market, it is the salesperson's role to persuade customers to make a purchase. While marketing aims for long-term goals, sales focus on achieving short-term objectives.
Jupyter for Data Science Teams
7 HoursThis instructor-led, live training in Uzbekistan (online or onsite) introduces the idea of collaborative development in data science and demonstrates how to use Jupyter to track and participate as a team in the "life cycle of a computational idea". It walks participants through the creation of a sample data science project based on top of the Jupyter ecosystem.
By the end of this training, participants will be able to:
- Install and configure Jupyter, including the creation and integration of a team repository on Git.
- Use Jupyter features such as extensions, interactive widgets, multiuser mode and more to enable project collaboraton.
- Create, share and organize Jupyter Notebooks with team members.
- Choose from Scala, Python, R, to write and execute code against big data systems such as Apache Spark, all through the Jupyter interface.
Kaggle
14 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at data scientists and developers who wish to learn and build their careers in Data Science using Kaggle.
By the end of this training, participants will be able to:
- Learn about data science and machine learning.
- Explore data analytics.
- Learn about Kaggle and how it works.
MATLAB Fundamentals, Data Science & Report Generation
35 HoursIn the initial segment of this training, we delve into the foundational aspects of MATLAB, exploring its role as both a programming language and an integrated platform. This section covers an introduction to MATLAB's syntax, arrays and matrices, data visualization techniques, script creation, and object-oriented programming principles.
During the second part, we showcase how MATLAB can be utilized for data mining, machine learning, and predictive analytics. To provide participants with a clear and practical understanding of MATLAB's capabilities, we compare its use to other tools such as spreadsheets, C, C++, and Visual Basic.
In the third segment, participants will learn techniques to streamline their work by automating data processing and report generation tasks.
Throughout the training, participants will apply the concepts learned through hands-on exercises in a laboratory setting. By the end of the course, participants will have a comprehensive understanding of MATLAB's features and will be able to use it effectively for solving real-world data science problems as well as automating their workflows.
Assessments will be conducted throughout the training to monitor progress.
Format of the Course
- The course combines theoretical discussions with practical exercises, including case studies, code reviews, and hands-on implementation.
Note
- Practice sessions will use pre-arranged sample data report templates. If you have specific requirements, please contact us to arrange them.
Machine Learning for Data Science with Python
21 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at intermediate-level data analysts, developers, or aspiring data scientists who wish to apply machine learning techniques in Python to extract insights, make predictions, and automate data-driven decisions.
By the end of this course, participants will be able to:
- Understand and differentiate key machine learning paradigms.
- Explore data preprocessing techniques and model evaluation metrics.
- Apply machine learning algorithms to solve real-world data problems.
- Use Python libraries and Jupyter notebooks for hands-on development.
- Build models for prediction, classification, recommendation, and clustering.
Accelerating Python Pandas Workflows with Modin
14 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at data scientists and developers who wish to use Modin to build and implement parallel computations with Pandas for faster data analysis.
By the end of this training, participants will be able to:
- Set up the necessary environment to start developing Pandas workflows at scale with Modin.
- Understand the features, architecture, and advantages of Modin.
- Know the differences between Modin, Dask, and Ray.
- Perform Pandas operations faster with Modin.
- Implement the entire Pandas API and functions.
Python and Spark for Big Data for Banking (PySpark)
14 HoursPython is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python.
Target Audience: Intermediate-level professionals in the banking industry familiar with Python and Spark, seeking to deepen their skills in big data processing and machine learning.
Python Programming for Finance
35 HoursPython is a programming language that has gained significant popularity in the financial sector. It has been adopted by major investment banks and hedge funds for developing a variety of financial applications, from core trading systems to risk management tools.
In this instructor-led, live training, participants will learn how to use Python to develop practical applications that address specific finance-related challenges.
By the end of this training, participants will be able to:
- Understand the fundamentals of the Python programming language
- Download, install, and maintain the best development tools for creating financial applications in Python
- Select and utilize the most appropriate Python packages and programming techniques to organize, visualize, and analyze financial data from various sources (CSV, Excel, databases, web, etc.)
- Build applications that solve problems related to asset allocation, risk analysis, investment performance, and more
- Troubleshoot, integrate, deploy, and optimize a Python application
Audience
- Developers
- Analysts
- Quants
Format of the course
- Part lecture, part discussion, exercises, and extensive hands-on practice
Note
- This training aims to provide solutions for some of the primary challenges faced by finance professionals. However, if you have a specific topic, tool, or technique that you would like to add or explore further, please contact us to arrange.
GPU Data Science with NVIDIA RAPIDS
14 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at data scientists and developers who wish to use RAPIDS to build GPU-accelerated data pipelines, workflows, and visualizations, applying machine learning algorithms, such as XGBoost, cuML, etc.
By the end of this training, participants will be able to:
- Set up the necessary development environment to build data models with NVIDIA RAPIDS.
- Understand the features, components, and advantages of RAPIDS.
- Leverage GPUs to accelerate end-to-end data and analytics pipelines.
- Implement GPU-accelerated data preparation and ETL with cuDF and Apache Arrow.
- Learn how to perform machine learning tasks with XGBoost and cuML algorithms.
- Build data visualizations and execute graph analysis with cuXfilter and cuGraph.
Python and Spark for Big Data (PySpark)
21 HoursIn this instructor-led, live training in Uzbekistan, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.
By the end of this training, participants will be able to:
- Learn how to use Spark with Python to analyze Big Data.
- Work on exercises that mimic real world cases.
- Use different tools and techniques for big data analysis using PySpark.
Stratio: Rocket and Intelligence Modules with PySpark
14 HoursStratio is a data-centric platform that integrates big data, AI, and governance into a single solution. The Rocket and Intelligence modules of Stratio facilitate rapid data exploration, transformation, and advanced analytics in enterprise settings.
This instructor-led, live training (conducted online or on-site) is designed for intermediate-level data professionals who wish to effectively utilize the Rocket and Intelligence modules in Stratio with PySpark, focusing on looping structures, user-defined functions, and advanced data logic.
By the end of this training, participants will be able to:
- Navigate and operate within the Stratio platform using the Rocket and Intelligence modules.
- Apply PySpark for data ingestion, transformation, and analysis.
- Use loops and conditional logic to manage data workflows and feature engineering tasks.
- Create and manage user-defined functions (UDFs) for reusable data operations in PySpark.
Format of the Course
- Interactive lecture and discussion sessions.
- Numerous exercises and practical activities.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.