quarta-feira, 25 de junho de 2014

Soccer, Brazil's passion, and big data, a relationship of love and hate


In a room located on the ninth floor of the IBM building, in Botafogo, Rio de Janeiro, a group of researchers and consultants closely watch the games of the World Cup. They watch every game, fueled by a lot of pizza and sodas. Like all Brazilians, vibrate with the best plays and suffer with the quality of some games. However, more than supporting a or b, they are there working on a IBM Reseach project. What they do is "listen" to all the fans who "speak" on Twitter about the games and the players and, from there, try to understand if they are liking it or not. The objective is to demonstrate a technology developed at IBM labs, and transformed into service, which makes analysis of Sentiments in Social Networks, named FAMA (Fame, in portuguese, and also the godess of rumor).

The process is complex. Hundreds of thousands of tweets are posted every game. Only in yesterday's game between Brazil and Cameroon, a player, Neymar, had its name mentioned on 409,971 posts. He got 50,000 tweets more than the game between United States and Portugal. Not bad for a single player. At the end of the first half, he got positive comments in 43% of all tweets .

During the game the team of Brazil received 1,563,387 entries, and the team of Cameroon, 130 846. Among the references to our team, 45% were positive, 16% were neutral and 39% negative. Definitely, we do not have unanimity regarding our national team.


"Termômetro Social",
the Globo organizations app "Segunda Tela"
How to stay tunned?

All results of the analyzes are published in social media, currently on the ESPN site, and a partnership with Globe, through application of the "second screen". See below how to monitor:
  • The website of ESPN, Torcida nas Redes, presents the analyzes made by IBM team.
  • The app "Segunda Tela da Globo" allows fans to participate in chats, shows general statistics about the games and provides the tracking of sentiment in the social network (the latter in partnership with IBM)


And how it works?

The process is very interesting and happens in real time. On a high level, FAMA monitors everything that is posted on Twitter about "World Cup". To do this, he needs the support of a special dictionary that basically allows the system to know if the tweet is about football or not (for other applications other dictionaries should be used). Each tweet is analyzed and if it has identified adherence to the theme, he is selected to be studied. From there, 5 steps happen:
  • The words composing each tweet are separated from each other in a process called parser (or tokenization)
  • Then the words are normalized, that is, errors are corrected and eventually synonyms are used
  • After that, each word is categorized according to the rules of the Portuguese grammar. Adjectives, nouns, verbs, and so on are identified.
  • Following, the lemma is found for each word. This is a particularly difficult step because it depends on the context (this is not just to find the root of a word)
  • Finally, the "sentiment" of each word is returned. It can be positive, negative or neutral.
The "sentiment" returned for each word was previously learned by other techniques and repetition (it is, technically speaking, taught to the algorithm). Once we have the sentiment of a word, we must now simply calculate the entire tweet sentiment. Finally, a statistical analyzer will calculate the frequencies with which the players' names are mentioned, with the most frequent themes are used, and so on. The result is then presented in a comparative way.

The process is fairly complex. We all know that words can have different meanings depending on how they are used. For example, the verb "go" is usually mild but in football, when used in "Go Brazil" has a positive connotation. But when it is used in "let's go, the game is bad," has a negative meaning. Likewise, the treatment to be given to other words passes through the same challenge. To solve it, a manual preliminary processing, where analysts assemble a table of polarity is necessary.

Where else FAMA can be used?

Apply all this technology and methodology in other environments is the main focus of IBM. Cognitive computing appears as the next big promise of the technology industry. Analyze large amounts of data available on social networks have the potential to provide extremely valuable insights to companies of all industries and sizes. Imagine being able to analyze, in real time, the feeling of the customers of a bank with respect to a new product launched in the market and make adjustments to marketing campaigns immediately. Or the ability to offer differentiated products and services, based on the feelings of their customers. The applications are enormous and can transform various industries such as retail, financial services, and many others.

Nenhum comentário:

Postar um comentário