MLOps with SageMaker — Part II

Customize train 🐳 In an earlier post we went through how to run a training script using sklearn, PyTorch or transformers with SageMaker by leveraging their preconfigured framework containers. The training scripts we used were self contained, meaning they only used the respective framework and python standard library. This meant we only had to worry about uploading our data and fetching our model from s3, and deciding the instance type we wanted to use....

June 29, 2022 · 6 min

MLOps with SageMaker — Part I

How to effortlessly train sklearn 📊, pytorch🔥, and transformers 🤗 models in the cloud SageMaker is a Machine Learning Operations (MLOps) platform, offered by AWS, that provides a number of tools for developing machine learning models from no code solutions to completely custom. With SageMaker, you can label data, train your own models in the cloud using hyperparameter optimization, and then deploy those models easily behind a cloud hosted API. In this series of posts we will explore SageMaker’s services and provide guides on how to use them, along with code examples....

May 4, 2022 · 7 min

Doing algorithmic review as a team: a practical guidance

Products that rely on data science run the risk of incorporating societal biases in the algorithms that power them — potentially causing unintended harms to their users. At Wellcome Data Labs we have been thinking and openly sharing our journey towards surfacing and mitigating those harms. We split our review process in two streams Product impact analysis which looks into harms that can be caused by the product irrespective of the algorithm Algorithmic review process which looks into harms introduced by the model or data irrespective of the product We have written about product impact analysis in the past so this post is going to discuss the second part of our review process in more detail....

March 24, 2021 · 7 min

A neural network tagging biomedical grants

Neural networks have been a ubiquitous part of the resurgence of Artificial Intelligence over the last few years. Unsurprisingly then, we decided to use a neural network as the modelling approach for tagging our grants with MeSH. Neural networks have raised the state of the art performance on the task to 71% from below 60%. Understandably neural networks may feel complicated to someone outside the field of machine learning, but in this piece my goal is to make them as understandable as logistic regression and Principal Component Analysis (PCA)....

November 24, 2020 · 8 min

Assesing the fairness of our machine learning pipeline

My day to day job is to develop technologies that automate different processes at Wellcome through data science and machine learning. As a builder of these digital products, my team and I have to consider the unintended consequences they may have on users and on society more broadly. We know this is important because of famous cases where the consequences weren’t considered and harm has been done. For example the Google photos tagging system and Amazon hiring algorithm that have been accused of unintended racist and sexist bias....

March 10, 2020 · 7 min

To Predict or to Explain

As data scientists, our day job is around modelling. We create models to recommend new products, to increase conversion rates, to explain user behaviour etc. And depending on your background, it is more likely to be familiar either with machine learning techniques or regression type analysis. It is the difference among these two distinct approaches in modelling that motivated me to write this post which is a summary of a talk I gave recently in PyData London....

May 24, 2017 · 6 min

To explain or to predict?

This talk is about an interesting topic I started reading about at the start of my career as a data scientist. In particular about the differences between using machine learning techniques for prediction vs more explanatory techniques, commonly used in sciences like regression.

May 7, 2017 · 1 min