Thumbnail
Access Restriction
Subscribed

Author Jacquet, Philippe
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Topic detection ♦ Social networks ♦ Topic propagation ♦ String combinatorics
Abstract Mankind has never been connected as it is now and as it will be tomorrow. Nowadays thanks to the rise of social networks such as Tweeter and Facebook, we can follow in real time the thought of millions of people. In fact we can almost feel the thoughts of a whole humanity and maybe project ourselves in a position where we could predict the major trends in the collective behavior of this humanity. However such an ambitious aim would require considerable resources in processing and networking which may be far from affordable. Indeed trends and topics are carried in a multiple of small texts written in various language and vocabularies like an hologram carries information in a dispersed way. Their capture and classification pose serious problems of data mining and analytics. Processes based on pure semantic analysis would require too much processing power and memory. We will present alternative methods based on string complexity also inspired on geolocalization in wireless networks which saves processing power by several order of magnitude. The ultimate goal is to detect when people are thinking about the very same topics before they become aware. Beyond the problem of topic detection and classification one must also estimate the potential of an isolated topic to become a lasting trend. In other word one must probe the topic foundations, for example by challenging how trustworthy are its sources. Designing an efficient source finder algorithm is indissociable with building realistic models about topic propagation. If we suppose that topics propagate inside communities via the followers-followees links, the propagation is highly amplified by the unbalances in the graph topology. It is established that dominating and semi dominating nodes such as the CNN Tweeter site are the main accelerator of topic propagation. The difficulty is to find the actual source of a topic beyond those screening nodes and the search is prone to false positive and true negative effects. In fact we will show that finding a source of topic is similar to finding a common ancestor in a Darwin channel where spurious mutations complicate the task.
Description Affiliation: Nokia Bell Labs, Nozay, France (Jacquet, Philippe)
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2014-01-10
Publisher Place New York
Journal ACM SIGMETRICS Performance Evaluation Review (PERV)
Volume Number 44
Issue Number 1
Page Count 1
Starting Page 125
Ending Page 125


Open content in new tab

   Open content in new tab
Source: ACM Digital Library