24 hours datathon

24 hours datathon

Armed with an espresso machine, we headed off to JADS in ‘s-Hertogenbosch on an early Saturday morning. Our mission for the next 24 hours: using artificial intelligence (AI) as a weapon to fight cyberbullying. Feeling pretty confident with a strong team that consists of 3 data scientists (Axel, Bruno, David), a software engineer (Jordy) and an ethical hacker (Dennis), we were ready to face this exciting challenge. 

Cyberbullying as a serious problem

Patrica Bolwerk (founder and director) of Stop Pesten Nu provided an overview on cyberbullying nowadays and told us that over 400.000 young people are bullied online via social media. A pretty big shock, right?! A serious problem that needs to be fixed using innovative solutions.

Rules of the game

At 11:30 hour, the organization explained the case, rules and deadlines to the 10 different teams. The official kick-off was at 12:00 o’clock: 24 hours to brainstorm, prototype and pitch our AI solution to the jury.

12:00  18:00

Let the brainstorm begin

We directly started brainstorming, having a clear goal in mind: a real product with the power of artificial intelligence! Our first thought was to build “a man in the middle” tool that captures all online traffic to detect cyberbullying. Initially it sounded like a good idea, but we quickly realised it will never work in practice due to privacy issues. Actually, most of our initial ideas faced this legal issue that makes it difficult to collect the cyberbullying data.

Power of the stories

During our brainstorm session, Michel Schrama (policy officer Kindertelefoon) and Anton Horreweg (teacher and behavioural specialist) came over to talk to us. They told us that the “Kindertelefoon” has an online forum for children, who are being bullied, to write and share their experiences. These stories are monitored by employees of the “Kindertelefoon”. We realised that the power is actually in these stories and we needed to find a solution, using this content. Thanks to this helpful information, we experienced our “aha moment”. The idea was born.

18:00  21:00

Our solution

Inspired by the story of the “Kindertelefoon”, we decided to create a platform where people can upload their personal stories and to build an AI model that automatically searches for similar stories. The purpose is to let people know that they are not alone; other people out there are facing the same problems. You can call it a “buddy system” if you like, with an additional coaching and advisory system. The platform encourages everyone to upload a story: someone who is bullied, someone who is the actual bully, or a so-called witness (for example a mother of a bullied child). The reason for this is that research shows that 90% of the population can be categorized in the “witness” group. To make sure people feel comfortable sharing their stories, every story will be anonymous.

Chatbot

Instead of employees having to monitor the stories, we build a chatbot. One major benefit is that the chatbot is 24/7 available and can instantly respond to an user. It can also find a similar story super quickly, losing no time on providing help. The chatbot can also communicate with the trained AI model and search for the historical data on the user story. Taking this into account, it can provide better help and support to the user. If the chatbot notices a serious call for help, for example, a suicide element in the story, it will use it so-called “emergency function”, which encourages users to get professional help.

21:00  00:00

THE SEARCH FOR DATA

Cyberbullying is a taboo. Finding the right data on social media seemed to be impossible, because people don’t write about this stuff in public. The data that we found was a Twitter dump based on hashtags, for example #bullied, #bullying, #cyberbullying. Unfortunately, this data was not useful. After some research, we found a lot of stories in forums and discussion groups. We decided to take these stories from http://www.wordswound.org to build and train our model. To extract all this data automatically from the website, we build a simple web crawler using Python.

TRAINING THE AI MODEL

We extracted 265 stories and labelled them manually (title, story and story type) to train our model and structured the story types into 3 categories:

V=victim
W=witness
B=bully

This way, the AI model knows who has written the story so it can find a better matching story. The right category also makes sure that the chatbot will provide the right advice or answer to the user. The AI model keeps learning, so eventually, the model is able to label automatically.

00:00  03:00

FINDING SIMILAR STORIES (TECHY WARNING!)

The AI model finds stories similar to the user’s story using TFIDF. This is an intuitive, yet powerful way of encoding textual stories as a series of numbers. A user sends a story and the AI model itself searches for similar stories. For every unique word in the text, it will find its term frequency (TF), and multiply this by its inverse document frequency (IDF). Let’s have a look at the 2 components.

Term frequency (TF)
The term frequency is the number of occurrences of a word in a story. A word that occurs often in a story tells us a lot about its topics. Eventually, we would like to match stories that have very similar topics. Users that talk a lot about “bullied at school” should see stories of other users that talk a lot about “bullied at school”, for example. Long stories contain more words, and words might occur more often. Therefore, to counter this, we divide the term frequency of each word by the length of its story. Writing ‘bullied at school’ twice in a text of 400 words has more weight than in a text of 800 words.

Word 1

Word 2 Word 3 Word 4

Word 5

Sentence 1

0 2/20 1/20 0 2/20

Sentence 2

3/18 2/18 0 0

0

Sentence 3 1/6 1/6 1/6 1/6

2/6

Inverse document frequency
The inverse document frequency is based on the number of stories that contain a word. It can be seen as a boost that a word will get for being rare. The more stories a word is contained in, the smaller its inverse document frequency will be. For example, most stories will contain the word ‘bully’, so this word doesn’t help us in finding similar stories. We can find the inverse document frequency for every word to create a list (how higher the number, how rarer the word):
Word 1 Word 2 Word 3 Word 4 Word 5
1.3 1 1.3 2 1.3

Multiply and voilà
To find similar stories, we can simply multiply the term frequencies with the inverse document frequencies, and compare the resulting series of numbers with each other. This can be done using various distance measures, and a popular measure is the cosine distance. The story that has the lowest cosine distance with our new story, will most likely be the most related.

03:00  06:00

Every man on a mission

At midnight, everyone was busy with their own task. Jordy started building the frontend of the app and the chatbot function. Axel worked on the backend, where he faced some midnight frustrations while installing the software. Dennis was busy collecting all the data, running the Python script. David and Bruno started labelling the 265 stories into the victim, bully and witness categories. At this point, we also started to develop an API system.

bedtime!

At 3 o’clock we had a working version, but we still needed to link the AI model to the app. We decided to get a few hours of sleep. Bruno didn’t even make it to his bed and fell asleep under his desk. What a work ethic 😉

06:00  13:00

Ready to shine

We got up at 6 in the morning to start connecting the app to the model. We took care of the last minor bugs and tested the app and the chatbot. Check the video below for a demo of our final product. Everything seemed to work perfectly, so now it was time to put everything in place and prepare the presentation. We were ready to shine.

Demo of our final product

Runner up!

Spotlight on and ready to take the stage. As David pitched our solution, we felt extremely pride with the result that we achieved within less than 24 hours. And our pride turned out to be valid! The jury felt impressed by our innovative idea and the fact that we actually build a working product. Final result: runner up place. Of course, ending first place would even be better, but we were very happy with the result and our product. Most of all, we truly hope that our solution can help in the fight against cyberbullying.