Useful Out-of-the-Box Machine Learning Models
With the growing popularity of data science, out-of-the-box machine learning models are becoming increasingly available in free, open-source packages. These models have already been trained on datasets that often large and therefore time-consuming. They are also packaged up in ways that are easy to use, thus simplifying the process of applying data science. While there is no replacing customising a model to suit your objectives, these packages are useful for obtaining quick results and provide a gentle, easy introduction into data science models. In this article, we touch on some of these tools and their applications to well-known machine learning problems.
Image classification using out-of-the-box machine learning models
Image classification is a machine learning task where an image is categorised into one of several categories through identifying the object in the image. It is easy for humans to recognise the same object shown in different backgrounds and different colours, and it turns out that it’s also possible for algorithms to classify images up to a certain amount of accuracy. Significant advances have been made in image classification as well as other computer vision problems with deep learning, and some of these available as out-of-the-box machine learning models in keras/tensorflow.
As an example, the inception-v3 model, trained on ImageNet, a database of over 14 million labelled images, is available as a tool for prediction. It is easy to set up, only taking a couple lines of code before you’re ready to begin classifying your own images – simply read in the image and the model will return a category and a score. The inception-v3 and other image classification models can also be fine-tuned by keeping the weights of some of the layers fixed, and tuning the weights of other layers by training on a new, more relevant dataset. This is a well-known procedure of transfer learning to customise the model to the new data. It is especially useful when the new dataset is small.
Named Entity Recognition
With Named Entity Recognition (NER), the goal of the task is to identify named entities within a chunk of text, such as the name of an organisation or a person, geographical locations, numerical units, dates and times, and so forth. This is useful for automatic information extraction, eg. in articles, reports, or invoices, and saves the effort of having a human manually read through a large number of documents. NER algorithms can help to answer questions such as which named entities are mentioned most frequently, or to consistently pick out a monetary value within the text.
Spacy has a NER module available in several languages, as well as a range of text processing capabilities such as part-of-speech tagging, dependency parsing, lemmatization etc. Installation of the package is straightforward and no more than a few lines of code is required to begin extracting entities. Spacy v2.0 NER models consist of subword features and a deep convolutional neural network architecture, and the v3.0 models have been updated with transformer-based models. Spacy also has the functionality to allow training on new data to update the model and improve accuracy, as well as a component for rule-based entity matching where a rule-based approach is more convenient.
Sentiment Analysis using out-of-the-box machine learning models
Sentiment analysis is used for identifying the polarity of a piece of text, i.e, whether it is positive, negative or neutral. This is useful for monitoring customer and brand sentiment, analysing any text-based feedback. More recently, it has been used for analysing public sentiment to the COVID-19 pandemic through social media or article headlines.
Hugging face transformers is a library containing a range of well-known transformer-based models that have obtained state-of-the-art results in a number of different natural language processing tasks. It includes both language models and task-specific models, and has a sentiment analysis model with a base architecture of distilBERT (a smaller/faster version of the BERT language model) which is fine-tuned for the sentiment analysis downstream task using the SST-2 dataset. The model returns positive and negative labels (i.e, excludes neutral) as well as a confidence score. Other options for an out-of-the-box sentiment analysis model are TextBlob, which has both a rules-based sentiment classifier and a naive-bayes model trained on movie reviews, and Stanza, which has a classifier based on a convolutional neural network architecture, and can handle English, German and Chinese texts. Both TextBlob and Stanza return a continuous score for polarity.
Transfer learning using pre-trained models
While there is a lot of value in an out-of-the-box model, its accuracy may not be the same when applied directly to an unseen dataset. This usually depends on how similar the characteristics of the new dataset is to the dataset the model is trained on – the more similar it is, the more the model’s results can be relied on. If direct use of an out-of-the-box model is not sufficiently accurate and/or not directly relevant to the problem at hand, it may still be possible to apply transfer learning and fine-tuning depending on package functionality. This involves using new data to build additional layers on top of an existing model, or to retrain the weights of existing layers while keeping the same model architecture.
The idea behind transfer learning is to take advantage of the model’s general training, and transfer its knowledge to a different problem or different data domain. In image classification for example, the pre-trained models are usually trained on a large, general dataset, with broad categories of objects. These pre-trained models can then be further trained to classify, for example cats vs dogs only. Transfer learning is another way to make use of out-of-the-box machine learning models, and is an approach worth considering when data is scarce.