Freelancer.nl | Masoud, Data Engineer & Data Scientist

Over deze freelancer

Data Engineer & Data Scientist | Azure Databricks, Python, SQL and Machine Learning

I help organizations reliably collect, structure, and analyze data. My primary strength lies in data engineering using Azure Databricks, Python, PySpark, SQL, Delta Lake, and Azure Data Factory. Additionally, I have a strong background in statistics, mathematics, and machine learning, enabling me not only to build data pipelines but also to translate data into reliable models and concrete decision-making information.

I can assist with:

Building and improving batch and streaming data pipelines
ETL/ELT processes and Bronze, Silver and Gold tiers
Data quality, validation, deduplication and reconciliation
Python and SQL analyzes on large and complex datasets
Exploratory Data Analysis and Statistical Analysis
Hypothesis testing, regression, classification and clustering
Predictive modeling and model validation
NLP, text analysis and AI-ready datasets
Dashboard and reporting datasets for Power BI and Spotfire
CI/CD, pytest, GitHub Actions and Databricks Asset Bundles

At ASML, I developed predictive models for equipment failures with 83% held-out accuracy. This work contributed to an estimated annual cost saving of €3.5 million. Additionally, I developed an NLP and failure-mode clustering pipeline for over 12,000 operational texts, reducing manual review by 60%.

My previous experience in the banking sector includes SQL-based data processing, ETL, risk assessment, reporting, reconciliation, data quality, and working with sensitive financial data.

Thanks to my MSc in Data Science and my background in applied mathematics, I can carefully apply statistical methods and clearly explain results. I work in a structured manner, document my solutions, and communicate easily with both technical and business stakeholders.

Suitable assignments:

Azure Databricks and PySpark projects
Data engineering and data platform development
Data cleaning, modeling and quality improvement
Python and SQL data analysis
Statistical analyzes and hypothesis testing
Predictive analytics and machine learning prototypes
NLP and text analysis
Analytics and dashboard data products

Available for 24–36 hours per week, remote or hybrid. Communication: fluent English; Dutch at B1/B2 level and actively improving.

Opleiding

2023 — 2025

MSc Computer Science — Data Science Track

Utrecht University

2017 — 2022

PhD Mathematics — Optimization & Game Theory

Payam Noor University

Werk & Ervaring

01-10-2025 — heden

Zelfstandige Data Engineering & Data Science Projects

Persoonlijk portfolio / MasouData

Ontwikkel praktische data-engineering- en data-science-oplossingen met Python, PySpark, SQL en Azure Databricks. Bouw onder andere real-time Kafka-naar-Databricks pipelines, Spark Structured Streaming-workflows, Bronze/Silver/Gold lakehouse-lagen en Azure Data Factory-processen voor incrementele data-ingestie. Implementeer datakwaliteitscontroles zoals schema-validatie, deduplicatie, reconciliatie, foutafhandeling en source traceability. Gebruik Databricks Asset Bundles, GitHub Actions en pytest voor reproduceerbare configuratie, testing en CI/CD. Werk daarnaast aan machine-learning-, NLP- en statistische projecten, waaronder semantic clustering met embeddings en LLM-assisted labelling, predictive modelling, modelvalidatie en Bayesiaanse hiërarchische modellering. Vertaal technische resultaten naar analyseklare datasets, interpreteerbare modellen en duidelijke zakelijke inzichten. Behaalde in mei 2026 de Databricks Certified Data Engineer Associate-certificering.

03-02-2025 — 30-09-2025

Data Scientist Intern — Predictive Analytics, NLP & Databricks

ASML

Ontwikkelde herbruikbare Python-, PySpark- en SQL-workflows op Azure Databricks voor het verwerken, valideren en analyseren van operationele data. Ontwikkelde en valideerde predictive-maintenance-modellen met XGBoost en Random Forest. De modellen behaalden 83% held-out accuracy en droegen bij aan een geschatte jaarlijkse kostenbesparing van €3,5 miljoen. Bouwde daarnaast een NLP enrichment- en failure-mode-clusteringpipeline voor meer dan 12.000 ongestructureerde operationele teksten. Hierdoor werd de handmatige beoordeling met 60% verminderd en konden terugkerende failure modes beter worden geanalyseerd. Paste datakwaliteitscontroles, Git, GitHub Actions, pytest, CI/CD, deduplicatie, reconciliatie en source traceability toe. Werkte samen met engineers, business controllers en andere stakeholders om requirements te vertalen naar betrouwbare datasets, modellen en Spotfire-inzichten.

01-11-2023 — 17-07-2024

Teaching Assistant — Python, Data Analysis & Process Analysis

Utrecht University

Begeleidde meer dan 50 studenten bij Python-programmering, debugging, data-analyse, reproduceerbare workflows en process analysis. Ontwikkelde herbruikbare codevoorbeelden, ondersteunde studenten bij het structureren en valideren van analyses en legde statistische en technische concepten begrijpelijk uit. Versterkte hiermee mijn vaardigheden in kennisdeling, code review, probleemdiagnose en communicatie met mensen met verschillende technische achtergronden.

01-09-2022 — 30-06-2022

Visiting Researcher — Responsible AI & ML for AML

Tilburg University

Onderzocht juridische en regelgevende uitdagingen bij het toepassen van machine learning op AML, waaronder interpretability, privacy, adversarial risks en data leakage. Voerde theoretisch onderzoek en interviews uit met Nederlandse compliance- en juridische professionals. Deze ervaring versterkte mijn kennis van responsible AI, explainability, gevoelige data en modelgebruik in gereguleerde omgevingen.

01-09-2015 — 01-09-2020

Data Analyst — Financial Analytics, ETL & Data Quality

Mehr Iran Bank

Ontwikkelde SQL-gebaseerde workflows voor data-extractie, ETL, reconciliatie, analyse en rapportage over klant-, transactie-, repayment-, kredietrisico-, fraude- en AML-data. Voerde exploratory data analysis, trendanalyse, risico-evaluatie en datakwaliteitscontroles uit. Paste onder andere logistic regression, decision-tree-methoden, risk scoring en threshold analysis toe om risicopatronen te herkennen en verdere beoordeling te ondersteunen. Automatiseerde terugkerende dataverwerking en rapportage, waardoor de handmatige inspanning met 40% werd verminderd. Documenteerde business rules, onderzocht ontbrekende of inconsistente records en werkte samen met risk-, audit-, compliance- en business-stakeholders. Deze ervaring combineerde data-analyse en statistische besluitvorming met praktische SQL-data-engineering in een gereguleerde omgeving.

Certificeringen

2026

Databricks Certified Data Engineer Associate

2025

Databricks Fundamentals

2024

AWS Educate Cloud Intro

Portfolio

Reviews

nog geen reviews

5 Sterren

4 Sterren

3 Sterren

2 Sterren

1 Sterren

€ 75 / uur

Locatie Utrecht
Categorie Development & IT

AI Services
Geverifieerd Email, Telefoon
Lid Sinds 14-06-2026

Mijn Skills

Data Engineering Data Scientist Python SQL Azure Data Engineer Databricks Pyspark statistiek Machine Learning Data Analysis ETL specialist Data Modelling Microsoft Azure data quality management kafka Power BI Services Docker CI CD Pipelines Natural language processing Predictive Analytics Kubernetes