Job description

We are seeking Databricks ETL Developer for our client in the public sector. This role is responsible for designing, developing, maintaining, and optimizing ETL (Extract, Transform, Load) processes in Databricks for data warehousing, data lakes, and analytics. The developer will work closely with data architects and business teams to ensure the efficient transformation and movement of data to meet business needs, including handling Change Data Capture (CDC) and streaming data.

Tools used are:

- Azure Databricks, Delta Lake, Delta Live Tables, and Spark to process structured and unstructured data.

-Azure Databricks/PySpark (good Python/PySpark knowledge required) to build transformations of raw data into curated zone in the data lake.

-Azure Databricks/PySpark/SQL (good SQL knowledge required) to develop and/or troubleshoot transformations of curated data into FHIR.

Data design

Understand the requirements. Recommend changes to models to support ETL design.

Define primary keys, indexing strategies, and relationships that enhance data integrity and performance across layers.

Define the initial schemas for each data layer

Assist with data modelling and updates of source-to-target mapping documentation

Document and implement schema validation rules to ensure incoming data conforms to expected formats and standards

Design data quality checks within the pipeline to catch inconsistencies, missing values, or errors early in the process.

Proactively communicate with business and IT experts on any changes required to conceptual, logical and physical models, communicate and review timelines, dependencies, and risks.

Understand the Tables and Relationships in the data model.

Create low level design documents and test cases for ETL development.

Implement error-catching, logging, retry mechanisms, and handling data anomalies.

Create the workflows and pipeline design

Develop high quality ETL mappings/scripts/notebooks

Develop and maintain pipeline from Oracle data source to Azure Delta Lakes and FHIR

Perform unit testing

Ensure performance monitoring and improvement

Troubleshoot performance issues, ETL issues, log activity for each pipeline and transformation.

Review and optimize overall ETL performance.

Plan for Go Live, Production Deployment.

Create production deployment steps.

Configure parameters, scripts for go live. Test and review the instructions.

Create release documents and help build and deploy code across servers.

Review existing ETL process, tools and provide recommendation on improving performance and reduce ETL timelines.

Review infrastructure and remediate issues for overall process improvement

Document work and share the ETL end-to-end design, troubleshooting steps, configuration and scripts review.

Hybrid: minimum of 3 days per week at the Office Location.

End date: March 2026

Job requirements

Must Have Skills:

7+ years using ETL tools such as Microsoft SSIS, stored procedures, T-SQL

2+ Delta Lake, Databricks and Azure Databricks pipelines

Strong knowledge of Delta Lake for data management and optimization.
Familiarity with Databricks Workflows for scheduling and orchestrating tasks.

2+ years Python and PySpark

Solid understanding of the Medallion Architecture (Bronze, Silver, Gold) and experience implementing it in production environments.
Hands-on experience with CDC tools (e.g., GoldenGate) for managing real-time data.

SQL Server

Oracle

Assets:

Knowledge of FHIR is an asset.

Databricks ETL Developer (PL644)

Job description

Job requirements

All done!