Course Outline
Introduction to Apache Iceberg
- Overview of Apache Iceberg
- Review of basic concepts
Deep Dive into Iceberg Architecture
- In-depth analysis of Iceberg's table format
- Detailed architecture overview, including metadata and file layout
- Internals of schema and partition evolution
Advanced Installation and Configuration
- Configuring Iceberg for optimal performance in different environments
- Integration with various data processing engines
- Advanced setup: security, encryption, and access controls
- Setting up Iceberg in a distributed environment
Advanced Operations and Maintenance
- Managing large-scale Iceberg tables
- Implementing and managing complex schema changes
- Handling partition evolution and hidden partitioning
- Advanced CRUD operations with schema and partition changes
Query Optimization Techniques
- Techniques for reducing query latency
- Partition pruning and file pruning
- Metadata caching and optimization strategies
- Implementing and testing query optimization techniques
Performance Tuning for Large Datasets
- Optimizing performance for large-scale datasets
- Using Iceberg's built-in features for performance tuning
- Case studies on performance tuning in real-world scenarios
- Tuning performance for large-scale datasets
Advanced Data Migration and Integration
- Migrating complex data structures from other systems
- Integrating Iceberg with real-time data streams
- Migrating complex datasets and integrating real-time data streams
Reliability and Consistency
- Ensuring data consistency and integrity in distributed environments
- Implementing and managing transactional guarantees
- Handling failures and recovery mechanisms
- Implementing reliability and consistency features
Advanced Features and Customization
- Custom catalog implementations
- Extending Iceberg with custom features
- Implementing custom catalog and extending Iceberg functionalities
Data Governance and Compliance
- Implementing data governance policies
- Compliance with data regulations
- Managing audit trails and data lineage
- Implementing governance and compliance features
Summary and Next Steps
Requirements
- Familiarity with core concepts, basic operations, and Iceberg table management
Audience
- Data engineers
- Data architects
- Data analysts
- Software developers
Testimonials (5)
The live examples
Ahmet Bolat - Accenture Industrial SS
Course - Python, Spark, and Hadoop for Big Data
very interactive...
Richard Langford
Course - SMACK Stack for Data Science
Sufficient hands on, trainer is knowledgable
Chris Tan
Course - A Practical Introduction to Stream Processing
Get to learn spark streaming , databricks and aws redshift
Lim Meng Tee - Jobstreet.com Shared Services Sdn. Bhd.
Course - Apache Spark in the Cloud
practice tasks