type: research project
Authors
Kruk Julia, Piechota Michał, Sekula Gaspar, Sieńko Zuzanna
Abstract
Performance in Information Retrieval (IR) heavily depends on the quality of embeddings. While dense embeddings excel at capturing complex patterns, their lack of interpretability limits user trust and control. Sparse representations, such as those from Sparse Autoencoders (SAEs), offer greater transparency by isolating distinct concepts but may sacrifice retrieval accuracy. In this study, we compare multimodal (CLIP) and sparse (SAE) embeddings in image-based IR against more traditional, dense and unimodal representations, evaluating both accuracy and computational efficiency. Our experiments reveal that interpretable sparse and multimodal representations reduce the IR performance.