Dataprep Fundamentals Training Course
Dataprep is a smart data service that facilitates the visual exploration, cleansing, and organisation of both structured and unstructured data, preparing it for analysis, reporting, and use in machine learning applications.
This instructor-led, live training (online or on-site) is designed for beginner to intermediate-level IT professionals who wish to gain the knowledge and practical skills required to effectively prepare data for analysis, ensuring accuracy, consistency, and reliability across diverse datasets.
By the end of this training, participants will be able to:
- Gain a comprehensive understanding of the importance of data preparation in ensuring high-quality, reliable data for analysis and modelling purposes.
- Acquire hands-on proficiency in data collection, cleaning, transformation, and integration techniques using real-world datasets.
- Develop the ability to identify and effectively address data-related challenges, discrepancies, and inconsistencies.
Course Format
- Interactive lectures and discussions.
- Abundant exercises and practice opportunities.
- Hands-on implementation in a live lab environment.
Course Customisation Options
- To request a customised training session for this course, please contact us to make arrangements.
Course Outline
Introduction
- Understanding the importance of data preparation in analytics and machine learning
- The data preparation pipeline and its role in the data lifecycle
- Exploring common challenges in raw data and their impact on analysis
Data Collection and Acquisition
- Data sources: databases, APIs, spreadsheets, text files, and more
- Techniques for collecting data and ensuring data quality during collection
- Collecting data from various sources
Data Cleaning Techniques
- Identifying and handling missing values, outliers, and inconsistencies
- Dealing with duplicates and errors in the dataset
- Cleaning real-world datasets
Data Transformation and Standardisation
- Data normalisation and standardisation techniques
- Handling categorical data: encoding, binning, and feature engineering
- Transforming raw data into usable formats
Data Integration and Aggregation
- Merging and combining datasets from different sources
- Resolving data conflicts and aligning data types
- Techniques for data aggregation and consolidation
Data Quality Assurance
- Methods for ensuring data quality and integrity throughout the process
- Implementing quality checks and validation procedures
- Case studies and practical applications of data quality assurance
Dimensionality Reduction and Feature Selection
- Understanding the need for dimensionality reduction
- Techniques such as PCA, feature selection, and reduction strategies
- Implementing dimensionality reduction techniques
Summary and Next Steps
Requirements
- Basic understanding of data concepts
Audience
- Data analysts
- Database administrators
- IT professionals
Open Training Courses require 5+ participants.
Dataprep Fundamentals Training Course - Booking
Dataprep Fundamentals Training Course - Enquiry
Dataprep Fundamentals - Consultancy Enquiry
Testimonials (2)
The variety of the information shared and the clarity to explain terms in plain English.
Arisbe Mendoza - Fairtrade International
Course - GDPR Workshop
It's a hands-on session.
Vorraluck Sarechuer - Total Access Communication Public Company Limited (dtac)
Course - Talend Open Studio for ESB
Upcoming Courses
Related Courses
Data Ethics
14 HoursData Ethics refers to the discipline focused on the responsible collection, utilization, and decision-making processes involving data, ensuring that human rights, privacy, transparency, and fairness are upheld.
This instructor-led training, available either online or on-site, is designed for public sector professionals who may have limited or no prior background in data ethics. It targets those who manage or govern data and seek to comprehend ethical risks, evaluate real-world dilemmas, and apply responsible data use principles in alignment with institutional values and public trust.
Upon completion of this training, participants will be equipped to:
- Define core concepts and frameworks within data ethics.
- Identify ethical risks and trade-offs associated with data collection, analysis, and deployment.
- Apply principles of transparency, consent, and fairness to practical scenarios.
- Integrate ethical reviews into governance or operational workflows.
Course Format
- Interactive lectures and discussions.
- Hands-on analysis of real-world data ethics case studies.
- Guided exercises centered on ethical evaluation and policy alignment.
Customization Options
- To request customized training for this course tailored to your department's workflows or internal tools, please contact us to arrange.
Data Integrity and Availability
14 HoursData Integrity and Availability focuses on guaranteeing that information stays precise, complete, consistent, and accessible whenever necessary, particularly within public sector settings that demand high trust.
This guided, live training session (available online or in-person) is designed for public sector staff tasked with managing or protecting data—regardless of their technical expertise—who aim to uphold the reliability, consistency, and accessibility of vital datasets and systems under their responsibility.
Upon completing this training, participants will be able to:
- Articulate and distinguish the core principles of integrity and availability throughout the data lifecycle.
- Identify and mitigate risks associated with data corruption, inconsistency, or unauthorized modifications.
- Develop data architectures that support high availability and ensure business continuity.
- Deploy policies and controls that foster long-term data reliability.
Course Format
- Engaging lectures and discussions.
- Practical assessment of data vulnerabilities and failure points.
- Supervised exercises centered on policy creation and incident prevention.
Customization Options
- For a tailored version of this course aligned with your department’s specific workflows or internal tools, please reach out to us to arrange a session.
Data Policies and Standards
14 HoursData Policies and Standards provides a structured approach to ensuring that government data is created, maintained, accessed, and utilized in a manner that is consistent, secure, and aligned with legal and ethical guidelines.
This instructor-led, live training (available online or onsite) targets public sector professionals responsible for establishing or applying data policies, regardless of their technical background, who aim to standardize, document, and enforce data practices across departments or systems.
By the end of this training, participants will be able to:
- Define and differentiate between data policies, standards, and procedures.
- Draft and evaluate data governance policies aligned with national and international frameworks.
- Promote consistent and high-quality data practices across teams and departments.
- Build a foundation for compliance, audit readiness, and trustworthy data systems.
Format of the Course
- Interactive lecture and discussion.
- Hands-on drafting of sample policies and standards.
- Guided evaluation of existing data workflows and controls.
Course Customization Options
- To request a customized training for this course based on your department's workflows or internal tools, please contact us to arrange.
Data Strategy
14 HoursA Data Strategy serves as the long-term blueprint for how an organization manages, leverages, and invests in data to fulfill its mission, enhance public services, and maintain accountability.
This instructor-led training, available either online or on-site, is designed for public sector professionals who have limited or emerging experience in data strategy but play a role in shaping or influencing strategic decisions. The program aims to help participants develop sustainable, mission-aligned data strategies across their organization or department.
Upon completion of this training, participants will be able to:
- Identify the core components of a comprehensive data strategy.
- Align data initiatives with organizational goals and public value.
- Create roadmaps for data governance, infrastructure, workforce skills, and innovation.
- Assess maturity levels and track progress toward becoming a data-driven organization.
Course Format
- Interactive lectures and group discussions.
- Practical exercises to develop strategy components and roadmaps.
- Guided analysis of public sector case studies and strategic frameworks.
Customization Options
- To arrange a customized training session tailored to your department’s workflows or internal tools, please contact us.
EBX5 for Developers
21 HoursThis instructor-led live training in Uzbekistan (online or onsite) is aimed at developers who wish to use EBX5 (TIBCO EBX) to enable a Master Data Management solution within their organization.
By the end of this training, participants will be able to:
- Interpret requirements and architect an MDM solution.
- Enable the management and integration of master data.
- Integrate and transfer data across multiple systems.
- Import data into EBX5 using match and merge logic.
- Design, create and document a data model that addresses their organization's business requirements.
- Integrate EBX5 with 3rd party services.
GDPR Workshop
7 HoursThis intensive one-day workshop is tailored for managers, department heads, and compliance professionals, enabling them to master the core principles of the General Data Protection Regulation. The curriculum addresses key topics such as GDPR fundamentals, rights of data subjects, data protection principles, consent mechanisms, obligations regarding breach notification, and the concept of privacy by design. Participants will gain access to practical frameworks for embedding GDPR compliance strategies throughout their organizations, thereby ensuring lawful data processing practices and fostering a robust culture of accountability in data protection.
Oracle GoldenGate
14 HoursThis instructor-led live training in Uzbekistan (online or onsite) is aimed at sysadmins and developers who wish to set up, deploy, and manage Oracle GoldenGate for data transformation.
By the end of this training, participants will be able to:
- Install and configure Oracle GoldenGate.
- Understand Oracle databases replication using the Oracle GoldenGate tool.
- Understand the Oracle GoldenGate architecture.
- Configure and perform a database replication and migration.
- Optimize Oracle GoldenGate performance and troubleshoot issues.
Personal Data Protection Officer - Basic Level
21 HoursTraining Purpose
- Familiarising participants with systematised, comprehensive aspects of personal data protection functioning under Polish and European law.
- Providing practical knowledge regarding the new rules for processing personal data.
- Presenting the areas of highest legal risk associated with the implementation of the GDPR.
- Offering practical preparation for independently performing the duties of a Personal Data Protection Officer.
Personal Data Protection Officer - Advanced Level
14 HoursPurpose of the Training
- Gaining practical knowledge on how to perform the tasks of the Inspector
- Gaining practical knowledge of how to audit and how to assess risk
- Providing practical knowledge about the new rules for the processing of personal data
Privacy in Federal Institutions (Requirements under the Privacy Act)
7 HoursPrivacy in Federal Institutions is a foundational course focused on the Privacy Act and its requirements for protecting personal information in government operations.
This instructor-led, live training (online or onsite) is aimed at public sector professionals with limited or emerging experience in privacy legislation who manage or process citizen data and wish to ensure compliance with the Privacy Act and related federal standards.
By the end of this training, participants will be able to:
- Understand the key provisions and principles of the Privacy Act.
- Identify personal information and handle it in accordance with legal obligations.
- Develop and implement privacy-compliant practices in day-to-day operations.
- Respond effectively to access to information and correction requests.
Format of the Course
- Interactive lecture and discussion.
- Hands-on use of policy scenarios in public sector contexts.
- Guided exercises focused on compliance, documentation, and reporting.
Course Customization Options
- To request a customized training for this course based on your department's workflows or internal tools, please contact us to arrange.
Microsoft Purview: Data Governance and Compliance
14 HoursThis live, instructor-led training delivered in Uzbekistan (online or onsite) targets data professionals at beginner, intermediate, and advanced levels who seek to utilize Microsoft Purview to bolster their data governance and compliance efforts.
By the conclusion of this training, participants will be able to:
- Install and configure Microsoft Purview.
- Implement data governance and compliance policies.
- Utilize data discovery and classification features.
- Monitor and manage data compliance.
Talend Administration Center (TAC)
14 HoursThis instructor-led, live training in Uzbekistan (online or on-site) is designed for system administrators, data scientists, and business analysts who aim to set up Talend Administration Center to deploy and manage organizational roles and tasks.
By the end of this training, participants will be able to:
- Install and configure Talend Administration Center.
- Understand and apply core Talend management principles.
- Build, deploy, and execute business projects or tasks within Talend.
- Monitor dataset security and develop business routines based on the TAC framework.
- Gain a broader understanding of big data applications.
Talend Big Data Integration
28 HoursThis instructor-led, live training in Uzbekistan (online or on-site) is designed for technical professionals who wish to deploy Talend Open Studio for Big Data to simplify the process of reading and analyzing large-scale data.
Upon completion of this training, participants will be able to:
- Install and configure Talend Open Studio for Big Data.
- Connect to Big Data systems such as Cloudera, HortonWorks, MapR, Amazon EMR, and Apache.
- Understand and configure Open Studio's Big Data components and connectors.
- Set parameters to automatically generate MapReduce code.
- Utilize Open Studio's drag-and-drop interface to execute Hadoop jobs.
- Prototype Big Data pipelines.
- Automate Big Data integration projects.
Talend Data Stewardship
14 HoursThis instructor-led, live training in Uzbekistan (online or on-site) is designed for beginner to intermediate-level data analysts who aim to deepen their knowledge and enhance their skills in managing and improving data quality using Talend Data Stewardship.
By the end of this training, participants will be able to:
- Gain a comprehensive understanding of the role of data stewardship in maintaining high-quality data.
- Leverage Talend Data Stewardship to manage data quality tasks effectively.
- Create, assign, and manage tasks within Talend Data Stewardship, including customising workflows.
- Utilise the tool's reporting and monitoring features to track data quality progress and stewardship initiatives.
Talend Open Studio for ESB
21 HoursIn this instructor-led, live training in Uzbekistan, participants will learn how to use Talend Open Studio for ESB to create, connect, mediate, and manage services and their interactions.
By the end of this training, participants will be able to
- Integrate, enhance, and deploy ESB technologies as unified packages across diverse deployment environments.
- Understand and leverage the most commonly used components within Talend Open Studio.
- Integrate any application, database, API, or web service seamlessly.
- Effortlessly connect heterogeneous systems and applications.
- Embed existing Java code libraries to extend project capabilities.
- Utilize community-driven components and code to enhance and expand projects.
- Quickly integrate systems, applications, and data sources using the intuitive drag-and-drop interface of the Eclipse environment.
- Reduce development time and maintenance costs by generating optimized, reusable code.