## Biography

 dia.mohamad@gmail.com I am currently a data scientist and machine learning course developer at the EPFL Extension School, where I am involved in the production of online learning materials, designing and leading workshops and hackathons, and promoting the EPFL Extension School data science and digital skills to a wide audience. I received the B.E. degree in Electrical and Computer Engineering and the B.A. degree in Economics with distinctions concurrently in 2012 from the American University of Beirut (AUB), Lebanon. I received the M.Sc. degree in Communication Systems in 2014 and the Ph.D. degree in Computer and Communication Sciences in 2018 both from the Swiss Federal Institute of Technology (EPFL), Switzerland. Between 2018 and 2020, I was a research scientist at the Institute for Data Science (i4DS) in the University of Applied Sciences Northwestern Switzerland (FHNW), where I worked in collaboration with the European Space Agency (ESA). I was a visiting researcher at Nokia Bell Labs, Germany in 2017 where I worked on the design of a novel coding scheme for the high-speed fiber optical communication systems. I was also a R&D engineer at the European Technology Center of SONY, Germany in 2014 working on the standardization of digital terrestrial TV broadcasting receivers. In 2011, I did my undergraduate internship at the University of California, Berkeley U.S.A. where I contributed to the “Mobile Millennium” traffic-monitoring project. My CV is available here.

## Research Interests

My research interests lie at the interface between statistical inference, machine learning, coding theory, and statistical physics of spin glasses. I was involved in developing new data science tools and applying deep learning techniques for the “Euclid” space mission project in order to investigate dark matter. My PhD research at EPFL's Information Processing Group (IPG) was principally focused on inference and learning over graphical models, which includes problems from error-correcting codes, compressed sensing, and community detection. My work spans both the practical and theoretical aspects of such problems. This covers the design and analysis of optimal low-complexity message-passing algorithms, the application of statistical physics methods, the derivation of information theoretic limits, and the development of rigorous proof techniques. (The wordcloud is based on my research statement 2018 - powered by wordclouds.com).

#### Astronomical Data Processing - Euclid Space Mission

The stunning discovery of the accelerated expansion rate of the universe in the late 1990s, as opposed to the former prevailing belief on the decelerated expansion, has changed the modern perception of the cosmos and presented several challenges in astrophysics. Such acceleration can be attributed to the presence of a mysterious invisible “dark matter” inducing a repulsive gravitational force; so that Einstein's general relativity continues to hold on the cosmological scale. “Euclid” is the first satellite, scheduled for launch by the ESA (European Space Agency), to map the geometry of dark matter. It will provide images of 2 billion galaxies with unprecedented quality. The Euclid consortium includes 1400 scientists across Europe and the USA. My work in the astroinformatics group covers the crucial pre-launch period (2017-2020) with a focus on software and algorithmic development for the scientific ground-segment activities. My research within Euclid revolves around solving inverse problems in order to investigate dark matter using high-spatial resolution imagery. This includes the development of new data science tools and the application of deep learning techniques to find patterns in cosmic structure.

#### Statistical Physics - Phase Transitions and Rigorous Predictions

Over the last century, statistical physics techniques have developed with the aim to describe the behaviour of systems with a large number of degrees of freedom and to give predictions which would be very difficult to guess. One of these techniques is the Replica method, which was conjectured to predict the asymptotic mutual information of a random graphical model and to detect the algorithmic and optimal phase transitions (see figure below). We prove that the Replica formula is exact in many problems that have been studied in the context of error correcting codes, compressed sensing and machine learning (mainly the random linear estimation and low-rank matrix factorization problems). Hence, we are able to come up with rigorous information-theoretical limits for many open problems. Moreover, we prove that, for a large set of parameters, an efficient iterative algorithm called Approximate Message-Passing (AMP) is optimal in the Bayesian setting. Our proof technique has an interest of its own as it is transposable to various inference problems and it exploits three essential ingredients: the Guerra-interpolation method introduced in statistical physics, the analysis of the AMP algorithm through State Evolution (SE) and the theory of spatial coupling and threshold saturation in coding.

#### Spatial Coupling - Algorithmic Tools and Proof Techniques

Spatial coupling is a powerful graphical representation used to improve the algorithmic message-passing performance. It is the underlying principle behind the threshold saturation phenomenon (where the algoritmic threshold achieves the optimal one). Such representation was successfully applied to multiple graphical models ranging from LDPC codes to compressed sensing. Spatial coupling can be represented via a graphical model starting from the original factor graph. Assume that we have a factor graph of size $$N$$. We take several instances of this factor graph and we place them next to each other on a chain of length $$Γ$$. We then locally couple the underlying factor graphs with a coupling window $$w$$ to obtain a bigger factor graph of size $$Γ × N$$ (see figure below). In the resulting factor graph, each variable node is connected to the corresponding check nodes of the same underlying factor graph and to the check nodes of the neighboring factor graphs. This construction creates a spatial dimension, along the positions of the chain, that will help the algorithm. The second step in constructing efficient spatially coupled graphs is to introduce a seed at a certain position of the chain. This seed can be introduced as a side information which helps the algorithm at the boundaries and initiates a “wave” that propagates inwards and boosts the performance. Interestingly, spatial coupling can be used both as a “construction technique” to boost the algorithmic performance and as a “proof technique” to compute some information theoretic quantities. Therefore, even if the problem at hand does not provide the freedom of constructing a spatially coupled model in practice, one can still use spatial coupling for an auxiliary model. Intuitively speaking, since the low-complexity algorithm on the auxiliary model is optimal by the threshold saturation phenomenon, it is easier to compute the information theoretic quantities on that model and then apply them to the underlying model.

## Selected Publications

#### Journals

Note: Authors are listed in alphabetical and/or affiliation order.