My thesis

Project period: 8/2024 - 8/2025</p>

**Team: **

  • Le Ngoc Thao, PhD
  • Tran Trung Kien, MSc
  • Pham Le Tu Nhi
  • Huynh Cao Khoi

Role

  • Data Collecting
  • Experiment Building
  • Representing

This project is my graduation thesis. My teammate Nhi and I explored the field of Causality and investigated its applications in Machine Learning.

The core objective of this thesis is a replication study. We selected a published paper, conducted an in-depth analysis of its methodology, and reproduced its results from scratch. Beyond replication, we designed additional experiments to further probe the proposed method — evaluating its potential, identifying its strengths, and surfacing its limitations.

Our thesis was structured into two main phases:

  • Phase 1 – Breadth-First Search (Aug 2024 – Jan 2025): Surveyed the broader landscape of causality research, studied foundational concepts, and identified a target paper for replication.
  • Phase 2 – Depth-First Search (Feb 2025 – Aug 2025): Deep-dived into the selected paper, reproduced its experiments, and conducted further analysis to extend its findings.

In this phase, our supervisor, Mr. Kien, provided us with a list of research topics to explore. Each week, we held a meeting where I surveyed and presented a paper that caught my interest. For each paper, we focused on extracting the following key information:

  • Core Idea: What problem does the paper address, and what does it aim to achieve?
  • Proposed Method: What approach is introduced, and what makes it novel?
  • Dataset: What datasets are used, and are they publicly available?
  • Results: What are the outcomes, and do they support the paper’s claims?

After surveying multiple topics such as interpretable machine learning, we converged on Causality as our focus. The field captivated us with its central idea: rather than relying solely on observed correlations, causality seeks to uncover the underlying cause-and-effect relationships that govern the world — offering a deeper and more principled understanding of data.

With our topic chosen, we continued surveying the literature while simultaneously building a solid foundation in the core concepts of causal inference. This eventually led us to an exciting intersection between causality and machine learning — specifically, its application to the Domain Generalization problem.

The paper we selected for replication is Invariant Models for Causal Transfer Learning.


In this phase, we reproduced the paper’s results by reimplementing its conducted experiments:

Experiment 1: Reproducing Paper Results

The paper contains two main experiments: one with synthetic data and one with gene perturbation data — a semi-synthetic dataset in the bioinformatics domain.

Our experiment results were consistent with the paper’s findings, demonstrating that the ICM method performs well in environments with hard interventions.

Experiment 2: ICM vs cICM

Experiment 3: ICM for Air Quality Problem