Quantifying Retrieval Quality in GraphRAG: A Schema-Agnostic Approach
In this paper, we propose a novel schema-agnostic framework for the automated generation of synthetic evaluation datasets from KGs. Unlike previous approaches, our framework establishes a rigorous, deterministic ground truth to specifically quantify the retriever performance across nine distinct query categories, including multi-hop and aggregation tasks.
Evaluation of GraphRAG Strategies for Efficient Information Retrieval
Traditional RAG systems struggle to capture relationships and cross-references between different sources unless explicitly mentioned. This challenge is common in real-world scenarios, where information is often distributed and interlinked, making graphs a more effective representation. Our work provides a technical contribution through a comparative evaluation of retrieval strategies within GraphRAG.
Flight Load Factor Predictions based on Analysis of Ticket Prices and other Factors
The ability to forecast traffic and to size the operation accordingly is a determining factor, for airports. However, to realise its full potential, it needs to be considered as part of a holistic approach, closely linked to airport planning and operations. To ensure airport resources are used efficiently, accurate information about passenger numbers and their effects on the operation is essential. Therefore, this study explores machine learning capabilities enabling predictions of aircraft load factors.
Investigating a Feature Unlearning Bias Mitigation Technique for Cancer-type Bias in AutoPet Dataset
We proposed a feature unlearning technique to reduce cancer-type bias, which improved segmentation accuracy while promoting fairness across sub-groups, even with limited data.
Muppet: A Modular and Constructive Decomposition for Perturbation-based Explanation Methods
The topic of explainable AI has recently received attention driven by a growing awareness of the need for transparent and accountable AI. In this paper, we propose a novel methodology to decompose any state-of-the-art perturbation-based explainability approach into four blocks. In addition, we provide Muppet: an open-source Python library for explainable AI.
Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images
Tumor tracking in PET/CT is essential for monitoring cancer progression and guiding treatment strategies. Traditionally, nuclear physicians manually track tumors, focusing on the five largest ones (PERCIST criteria), which is both time-consuming and imprecise. Automated tumor tracking can allow matching of the numerous metastatic lesions across scans, enhancing tumor change monitoring.
Robust ML Approach for Screening MET Drug Candidates in Combination with Immune Checkpoint Inhibitors
Present study highlights the significance of dataset size in ICI microbiota models and presents a methodology to enhance the performances of a multi-cohort-based ML approach.
Augment to Interpret: Unsupervised and Inherently Interpretable Graph Embeddings
In this paper, we study graph representation learning and show that data augmentation that preserves semantics can be learned and used to produce interpretations. Our framework, which we named INGENIOUS, creates inherently interpretable embeddings and eliminates the need for costly additional post-hoc analysis.
SANGEA: Scalable and Attributed Network Generation
In this paper, we present SANGEA, a sizeable synthetic graph generation framework that extends the applicability of any SGG to large graphs. By first splitting the large graph into communities, SANGEA trains one SGG per community, then links the community graphs back together to create a synthetic large graph.
TS-Relax : Interprétation des représentations apprises pour les séries temporelles
Les modèles d’apprentissage de représentations sont de plus en plus utilisés, mais des modèles d’IA explicables et de confiance sont nécessaires. Ce travail présente l’adaptation aux séries temporelles d’une méthode d’interprétation de représentation initialement conçue pour les images.
Comparison of Machine Learning Approaches for POD24 Prediction
Early identification of patients with relapsing follicular lymphoma (FL) is critical but remains elusive. We initiated a collaboration between the academic CALYM Carnot Institute aiming at developing interpretable artificial intelligence (AI) models based on PET images to predict POD24.
The Building Blocks of a Responsible AI Practice: An Outlook on the Current Landscape
Responsible AI comes with the challenge of implementation. This survey aims to bridge the gap between principles and practice through a study of different approaches taken in the literature and the proposition of a foundational framework.
A Fair Classifier Embracing Triplet Collapse
In this paper, we study the behaviour of the triplet loss and show that it can be exploited to limit the biases created and perpetuated by machine learning models.
Dynamic Pairwise Wake Vortex Separations For Arrivals Using Predictive Machine Learning Models
Aircraft wake behaviour and meteorological information is monitored and processed using ML algorithms which determine the wake separation minimum reductions that can be safely applied between subsequent arriving aircraft.
Machine Learning Supporting Enhanced Optimized Spacing Delivery between Consecutive Departing Aircraft
This paper introduces the enhanced Optimised Spacing Delivery tool which builds on the OSD tool using Machine Learning to make more accurate predictions of aircraft behaviour and wind on the initial departure path.
Calibrate to Interpret
Trustworthy machine learning is driving a large number of the ML community works in order to improve ML acceptance and adoption. In this paper, we show a first link between uncertainty and explainability, by studying the relation between calibration and interpretation.
Automatic Parameter Tuning for Big Data Pipelines
Big data frameworks generally constitute a pipeline, each having a different role. This makes tuning big data pipelines an important yet difficult task given the size of the search space. We propose to use a deep reinforcement learning algorithm to tune a fraud detection big data pipeline.
Multimodal Classifier For Space Target Recognition
We propose a multi-modal framework to tackle the SPARK Challenge by classifying satellites using RGB and depth images. Our framework is mainly based on Auto-Encoders to embed the two modalities in a common latent space in order to exploit redundant and complementary information between the two types of data.
AMI-Class: Towards a Fully Automated Multi-view Image Classifier
In this paper, we propose an automated framework for multi-view image classification tasks. The proposed framework is able to, all at once, train a model to find a common latent representation and perform data imputation, choose the best classifier and tune all necessary hyper-parameters.
Policy-Based Automated Compliance Checking
In this paper, we propose an automated policy-based compliance checking model and implement it using SHACL.
Estimating Expected Calibration Errors
Uncertainty in probabilistic classifiers predictions is a key concern when models are used to support human decision making, in broader probabilistic pipelines or when sensitive automatic decisions have to be taken.
DAEMA: Denoising Autoencoder with Mask Attention
Missing data is a recurrent and challenging problem, especially when using machine learning algorithms for real-world applications. For this reason, missing data imputation has become an active research area, in which recent deep learning approaches have achieved state-of-the-art results. We propose DAEMA: Denoising Autoencoder with Mask Attention.
Anomaly Detection: How to Artificially Increase your F1-Score with a Biased Evaluation Protocol
Anomaly detection is a widely explored domain in machine learning. Many models are proposed in the literature, and compared through different metrics measured on various datasets. The most popular metrics used to compare performances are F1-score, AUC and AVPR...
A Framework Using Contrastive Learning for Classification with Noisy Labels
We propose a framework using contrastive learning as a pre-training task to perform image classification in the presence of noisy labels. Recent strategies, such as pseudo-labelling, sample selection with Gaussian Mixture models, and weighted supervised contrastive learning have been combined into a fine-tuning phase following the pre-training.