Extreme Multilabel Classification in the Biomedical NLP Domain

This talk is based on a project I did for the Wellcome Trust for which I had to develop a model that assigns the most relevant of 29K MeSH tags. If you want to read more about this topic, here are some additional blogs I have written A neural network tagging biomedical grants Tagging biomedical grants with 29K tags Making an optimisation algorithm 10K times faster

June 8, 2022 · 1 min

Making an optimisation algorithm 10k times faster 🏎

How we made our multilabel classification threshold optimizer converge in minutes instead of days Multilabel classification is a common task in machine learning and Natural Language Processing (NLP). We approach it by training a model that can apply one or more labels to each new example that it sees. Since the model will output a probability for each of the labels, one of the parameters we can tweak to improve its performance (for example measured in micro f1) is the threshold probability at which a label is applied....

April 13, 2022 · 6 min

Tagging biomedical grants with 29K tags

In a previous post we spoke about a neural architecture we developed for classifying our grants with ~5K disease tags from the MeSH (Medical subject Headings) hierarchy. In this post we will touch on the techniques needed to scale to a model to classify all ~29K MeSH tags. Our dataset consists of 14M biomedical publications labelled with one or more MeSH tags (on average 12 tags per publications), so the challenge is both the thousands of outputs our model needs to recognise and the millions of examples it needs to learn from....

December 13, 2021 · 9 min