In real world cases, the traditional approach of using a single output for a single input may not be sufficient. Sometimes we need multiple labels, or outputs, to be associated with each input, providing greater flexibility and functionality in complex applications. This is where the multi-label design pattern comes in.
In this article, we will explore the multi-label design pattern in depth, its benefits, and how it solves the problem of handling multiple outputs.
In machine learning, multi-label classification or multi-output classification is a variant of the classification problem, where multiple non-exclusive labels may be assigned to each instance.
Multi-label classification is a generalisation of multiclass classification, which is the single-label problem of categorising instances into precisely one of several (more than two) classes. In the multi-label problem, the labels are non-exclusive and there is no constraint on how many of the classes the instance can be assigned to.
Multi-label classification design pattern in action
Here are a few real-world examples of how the multi-label classification design pattern has been used in practice and the benefits it has brought to organisations:
In medical diagnosis, a patient’s symptoms may not be exclusive to a single disease or condition. A multi-label classification approach allows for multiple potential diagnoses to be considered simultaneously, leading to more accurate diagnoses and treatments.
In a study published in the Journal of Biomedical Informatics, a multi-label classification approach was used to diagnose five different types of thyroid diseases, therefore resulting in an accuracy rate of 98.48%.
Social media platforms generate vast amounts of data, and classifying content accurately is crucial for various tasks such as sentiment analysis, spam detection, and content moderation. In a study published in the Journal of Intelligent Information Systems, a multi-label classification approach was used to classify Twitter posts according to 17 different categories, resulting in an accuracy rate of 87%.
In image classification, multi-label classification is often used to identify multiple objects or features in an image simultaneously. For example, in a study published in the journal Remote Sensing, a multi-label classification approach was used to identify different land cover features in remote sensing imagery, resulting in an accuracy rate of 90%.
As shown above in each of these examples, the multi-label classification approach led to improved accuracy and efficiency compared to traditional single-label approaches. Furthermore, it allowed for more complex and nuanced classifications, enabling organisations to extract more value from their data.
Here is a code snippet in Python using Scikit-learn library to demonstrate the implementation of the multi-label classification design pattern:
from sklearn.multioutput import MultiOutputClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_multilabel_classification
from sklearn.metrics import hamming_loss, precision_score, recall_score, f1_score
# Create a synthetic dataset with 100 samples and 10 features
X, y = make_multilabel_classification(n_samples=100, n_features=10, n_classes=5)
# Train a Decision Tree classifier using MultiOutputClassifier
dtc = DecisionTreeClassifier()
clf = MultiOutputClassifier(dtc).fit(X, y)
# Evaluate the model
y_pred = clf.predict(X)
hl = hamming_loss(y, y_pred)
precision = precision_score(y, y_pred, average='macro')
recall = recall_score(y, y_pred, average='macro')
f1 = f1_score(y, y_pred, average='macro')
print('Hamming Loss: ', hl)
print('Precision: ', precision)
print('Recall: ', recall)
print('F1 Score: ', f1)
In this example, we create a synthetic dataset with 100 samples and 10 features, and 5 output classes. We then train a Decision Tree classifier using MultiOutputClassifier, which allows for multi-label classification. Finally, we evaluate the performance of the model using Hamming Loss, Precision, Recall, and F1 Score.
Thescikit-multilearn Python package specifically caters to the multi-label classification. It provides multi-label implementation of several well-known techniques including SVM, kNN and many more. The package is built on top of Scikit-learnecosystem.
Problem transformation methods
Several problem transformation methods exist for multi-label classification, and can be roughly broken down into:
The default strategy, referred to as the binary relevance technique, entails separately training a binary classifier for each label. The integrated model then forecasts all the labels for an unseen sample for which the individual classifiers forecast a positive outcome.
Although this method of dividing the task into multiple binary tasks may resemble superficially the one-vs.-all (OvA) and one-vs.-rest (OvR) methods for multiclass classification, it is essentially different from both, because a single classifier under binary relevance deals with a single label, without any regard to other labels whatsoever.
An alternate approach for splitting up a multi-label classification problem into numerous binary classification problems is to use a classifier chain. In contrast to binary relevance, labels are predicted sequentially, and the output of all previous classifiers before it (positive or negative for a given label) are provided as features to the classifiers after it.
The label powerset (LP) transformation creates one binary classifier for every label combination present in the training set.
For example, if possible labels for an example were A, B, and C, the label powerset representation of this problem is a multi-class classification problem with the classes [0 0 0], [1 0 0], [0 1 0], [0 0 1], [1 1 0], [1 0 1], [0 1 1]. [1 1 1] where for example [0 0 1] denotes an example where labels A and B are present and label C is absent.
De nombreux secteurs subissent ou embrassent de profondes transformations digit...
A PROPOS DE L'AUTEUR
Jaafar travaille comme Datascientist et se passionne pour le Machine Learning et le MLOps. Sa soif d'apprentissage n'a d'égale que son envie de partage. Pour plus d'infos : https://jaafarbenabderrazak-info.medium.com/
Le blog reBirth
Nous luttons contre les raccourcis intellectuels, proposons des alternatives, challengeons les pratiques, partageons nos expériences et provoquons une réaction. En ce sens nous ENTREPRENONS et révélons les singularités.