Istella22 Dataset

Mirror of the original Istella blog post.

Istella is glad to release the Istella22 Dataset to the public.

To use the dataset, you must read and accept the Istella22 License Agreement. By using the dataset, you agree to be bound by the terms of the license: the Istella dataset is solely for non-commercial use.

Istella22

Neural approaches that use pre-trained language models are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their effectiveness compared to feature-based Learning-to-Rank (LtR) methods has not yet been well-established. A major reason for this is because present LtR benchmarks that contain query-document feature vectors do not contain the raw query and document text needed for neural models. On the other hand, the benchmarks often used for evaluating neural models (e.g., MS MARCO, TREC Robust, etc.), provide text but do not provide query-document feature vectors.

The Istella22 dataset enables such comparisons by providing both query/document text and strong query-document feature vectors used by an industrial search engine. The dataset consists of a comprehensive corpus of 8.4M web documents, a collection of query-document pairs including 220 hand-crafted features, relevance judgments on a 5-graded scale, and a set of 2,198 textual queries used for testing purposes.

Istella22 enables a fair evaluation of traditional learning-to-rank and transfer ranking techniques on the same data. LtR models exploit the feature-based representations of training samples while pre-trained transformer-based neural rankers can be evaluated on the corresponding textual content of queries and documents.

You can download the Istella22 dataset from the Istella website here.

In case you use the Istella22 dataset, we ask you to acknowledge Istella SpA and to cite the following publication in your research:

D. Dato, S. MacAvaney, F.M. Nardini, R. Perego, N. Tonellotto. 2022. The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. DOI: 10.1145/3477495.3531740