Thumbnail: jekyll

GalaXC

on under Publications
2 minute read

GalaXCLogo

Joint work with Arnav Jain, Kushal Dave, Jian Jiao, Amit Singh, Bruce Zhang, Manik Varma

The task which GalaXC tries to solve is to annotate a document with the most relevant subset of labels from an extremely large label set - Extreme Classification. Extreme classification has been successfully applied to several real world web-scale applications such as web search, product recommendation, query rewriting, etc. GalaXC identifies two critical deficiencies in leading extreme classification algorithms. First, existing approaches generally assume that documents and labels reside in disjoint sets, even though in several applications, labels and documents cohabit the same space. Second, several approaches, albeit scalable, do not utilize various forms of metadata offered by applications, such as label text and label correlations. To remedy these, GalaXC presents a framework that enables collaborative learning over joint document-label graphs at massive scales, in a way that naturally allows various auxiliary sources of information, including label metadata, to be incorporated. GalaXC also introduces a novel label-wise attention mechanism to meld high-capacity extreme classifiers with its framework. An efficient end-to-end implementation of GalaXC is presented that could be trained on a dataset with 50M labels and 97M training documents in less than 100 hours on 4xV100 GPUs. This allowed GalaXC to not only scale to applications with several millions of labels, but also be up to 18% more accurate than leading deep extreme classifiers, while being upto 2-50x faster to train and 10x faster to predict on benchmark datasets. GalaXC is particularly well-suited to warm-start scenarios where predictions need to be made on data points with partially revealed label sets, and was found to be up to 25% more accurate than extreme classification algorithms specifically designed for warm start settings. In A/B tests conducted on the Bing search engine, GalaXC could improve the Click Yield (CY) and coverage by 1.52% and 1.11% respectively.

You can watch our conference presentation video here.

Please use this Bibtext in case you need to cite the work:

@InProceedings{Saini21,
	author       = "Saini, D. and Jain, A.~K. and Dave, K. and Jiao, J. and Singh, A. and Zhang, R. and Varma, M.",
	title        = "GalaXC: Graph neural networks with labelwise attention for extreme classification",
	booktitle    = "Proceedings of The ACM International World Wide Web Conference",
	month = "April",
	year = "2021",
	}

Please feel free to tinker with the code.

Extreme Classification, Publications, TheWebConf