Data Engineering on Microsoft Azure (DP-203/DP-203T00)
In this course, the student will learn how to implement and manage data engineering workloads on Microsoft Azure, using Azure services such as Azure Synapse Analytics, Azure Data Lake Storage Gen2, Azure Stream Analytics, Azure Databricks, and others. The course focuses on common data engineering tasks such as orchestrating data transfer and transformation pipelines, working with data files in a data lake, creating and loading relational data warehouses, capturing and aggregating streams of real-time data, and tracking data assets and lineage.
After completing this course, you’ll be able to design, build, and manage end-to-end data engineering solutions on Microsoft Azure. You’ll gain practical experience in orchestrating data pipelines, transforming data for analytics, and applying best practices in security and monitoring.
- Implement Azure data storage solutions, including Data Lake and Synapse
- Build and manage data pipelines for batch and streaming workloads
- Transform and analyze data using Spark and Databricks
- Secure data environments with Azure authentication and encryption
- Monitor, troubleshoot, and optimize data solutions for performance
This course is ideal for data engineers, data architects, business intelligence professionals, and anyone responsible for designing or implementing data solutions in the Microsoft Azure ecosystem. Data analysts and data scientists seeking hands-on experience with Azure tools will also benefit.
- Understanding of cloud computing and core data concepts
- Experience working with data solutions
- Recommended: AZ-900T00 Microsoft Azure Fundamentals and/or DP-900T00 Microsoft Azure Data Fundamentals
- What is data engineering
- Key data engineering concepts
- Data engineering in Microsoft Azure
- Understand and enable Azure Data Lake Storage Gen2
- Compare Data Lake Store to Azure Blob Storage
- Stages for processing big data
- Use Data Lake Storage Gen2 in analytics workloads
- What Azure Synapse Analytics is
- How it works
- When to use it
- Capabilities and use cases
- Query files using a serverless SQL pool
- Create external database objects
- Transform data files with CREATE EXTERNAL TABLE AS SELECT
- Encapsulate transformations in a stored procedure
- Include transformations in a pipeline
- Lake database concepts
- Explore templates
- Create and use lake databases
- Authentication methods
- Manage users and permissions
- Get to know Spark
- Use Spark in Synapse Analytics
- Analyze and visualize data
- Modify and save DataFrames
- Partition data files
- Transform data with SQL
- Understand Delta Lake
- Create Delta Lake and catalog tables
- Work with streaming data
- Use Delta Lake in a SQL pool
- Understand pipelines
- Create pipelines in Synapse Studio
- Define and run data flows
- Use Synapse Notebooks in pipelines
- Parameterize notebooks
- Design schemas and create tables
- Load staging, dimension, time, and fact tables
- Handle slowly changing dimensions
- Perform post-load optimization
- Query a data warehouse
- Scale compute resources and pause compute
- Manage workloads and review recommendations
- Use dynamic management views to troubleshoot queries
- Apply network security and Conditional Access
- Configure authentication, column/row-level security, and Dynamic Data Masking
- Implement encryption
- Understand HTAP patterns
- Azure Synapse Link overview
- Implement Synapse Link with Cosmos DB and SQL
- Enable analytical stores and linked services
- Query Cosmos DB with Spark and Synapse SQL
- Understand data streams and event processing
- Explore window functions
- Ingest data using Azure Stream Analytics with Synapse
- Configure inputs, outputs, and queries
- Run jobs for ingestion
- Visualize real-time data with Power BI
- Get started with Databricks
- Identify workloads and key concepts
- Data governance with Unity Catalog and Microsoft Purview
- Ingest and explore data
- Use DataFrame APIs for analysis
- Create Spark clusters and notebooks
- Process and visualize data
- Get started with Delta Lake
- Manage ACID transactions and schema enforcement
- Use data versioning, time travel, and integrity features
- Delta Live Tables for ingestion and real-time processing
- Azure Databricks Workflows: components, benefits, and deployment