Anomaly detection explained with football Blog Devoteam Revolve

Temps de lecture : 3 minutes

It is Sunday morning, you are having your coffee, and you’re reading the newspaper as you usually do. As a football fan, you go to the sports session to see the last results of the previous matches and then… something grabs your attention (and also almost make you drop the coffee on yourself!)… A second division team won the most competitive European championship! In your head, there is only one thing that comes to your mind is “Whaaaat?!”.

Yes, the act of noticing or discovering something unusual or strange enough to be noticeable (i.e., being surprised!) is what we call Anomaly Detection (Cambridge Dictionary N.A.).

The anomaly detection is a technique that could be applied to different situations (Dutta & Vallabhajosyula, 2017):

Telecom: Detecting roaming abuse, revenue fraud and service disruptions;
Banking: Flag abnormally high purchases/deposits and detect cyber intrusions;
Finance & Insurance: Detect and prevent the pattern of fraudulent spend and travel expenses;
Healthcare: Detect fraud in claims and payments;
Manufacturing: Detect abnormal machine behaviour to prevent cost overruns;
Social Media: Detect compromised accounts and bots that generate fake reviews;
Networking: Detect intrusion into networks;
Smart house: Detect energy leakage;
Video Surveillance: Detect or track objects and persons of interest;

Analytical as you are, you got curious to see how to use this technique to identify atypical events of your favourite football team, for example, unexpected wins or loose with unusual scores. But, before starting to search for data, it is important to understand another key concept to apply Anomaly Detection techniques. This concept is called time-series.

Time-series “is a set of regular time-ordered observations (…) taken at successive, in most cases equidistant, periods/points of time.” (OECD.org, N.A.). In our football example, the time-series data is the historical data of all the matches and the number of goals, pros and cons in each match.

Merging the concept of Anomaly Detection and Time-series, we could say that Time-series Anomaly Detection is the identification of rare events that have differing characteristics from the majority of time-ordered processed data (DeepAI.org, N.A.).

Again, back to our example, we could assume that the objective of using this technique is to identify, at the historical data, the matches where our team won or lost with an unusual score difference.

To simplify the analysis, let’s create a new variable called ‘number of goals net’ (num_goals_net) that will represent the number of goals in favour minus the number of goals against. After searching on the Internet to get the dataset and some data wrangling work, we arrive at the plot below for our team, which illustrates our new variable values across time.

A quick look at the plot shows that most of the results across time are in the range -2 and 2, in terms of ‘num_goals_net’. The lower value means matches that the team lost from a difference of 2 goals, and the higher one matches where they won from a different of 2 goals. It looks like everything out of that range are events considered unusual/atypical/less frequent. After using the anomaly detection techniques, we could see clearly that the interpretation looks correct:

The green area shows the majority of the events (around 90% of the cases), and the red dots are the anomalies detected by the algorithm.

Curious to know more about the anomaly detection techniques? How can you use that on your business cases? Feel free to contact us for a more in-depth discussion !

References – Accessed at 22 February 2021:

Cambridge Dictionary, N.A. ‘Anomaly’ [ONLINE]. Available here
Cambridge Dictionary, N.A. ‘Detection’ [ONLINE]. Available here
DeepAI.org, N.A. ‘Anomaly Detection’ [ONLINE]. Available here
Dutta, S., Vallabhajosyula, R. R., 2017. ‘Anomaly Detection – Real World Scenarios, Approaches and Live Implementation’ [SLIDESHARE]. Available here.
OECD.org, N.A. ‘Time Series’ [ONLINE]. Available here

Anomaly detection explained with football

Commentaires :

A lire également sur le sujet :

A PROPOS DE L'AUTEUR

Luiz Carlos De Jesus Junior

Le blog reBirth

Articles récents

A la rencontre de François, consultant Cloud

H.E.A.T. is how we do it !

Cookie	Durée	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Anomaly detection explained with football

Commentaires :

A lire également sur le sujet :

Machine Learning sur le Cloud : Retour sur ...

Machine learning : quels sont les cas d&rsq...

Big Data Paris 2018 : architectures pour le...

Dans ta science : le traitement des sons am...