University of Waterloo
200 University Ave W, Waterloo, ON
N2L 3G1
Phone: (519) 888-4567
Staff and Faculty Directory
Contact the Department of Electrical and Computer Engineering
Rania Ibrahim
Scalable Topic Detection Approaches from Twitter Streams
Fakhreddine Karray
Real time topic detection in Twitter streams is an important task that helps discovering natural disasters in a real time from users’ posts and helps political parties and companies understand users’ opinions and needs. In 2014 the number of active users on Twitter is reported to be more than 288 million users who are posting around 500 million tweets daily. Therefore, detecting topics from Twitter streams in a real time needs scalable and efficient techniques. In this work, we scale an Exemplar-based technique that detects topics from Twitter streams, where each of the detected topics is represented by one tweet (i.e, exemplar). Using exemplar tweets to represent the detected topics, makes these topics easier to interpret as opposed to representing them by uncorrelated terms as in other topic detection algorithms. The approach is implemented using Apache Giraph and is being extended to efficiently support sliding windows. Experimental results show that Giraph Exemplar-based approach achieves the best topic recall and term precision against other distributed related work, while maintaining a good term recall and running time.
Moreover, another clustering technique called Local Variance-based Clustering (LVC) is proposed in this work for detecting topics from Twitter streams. Local Variance-based Clustering (LVC) defines the data points densities based on their similarities. The proposed local variance measure is calculated based on the variance of the data points similarity histogram and is shown to well distinguish between core, border, connecting and outliers points. Experimental results show that LVC can detect topics from Twitter with higher topic recall by 15% and higher term precision by 3% over DBSCAN.
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
28
|
29
|
30
|
31
|
1
|
3
|
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
21
|
22
|
23
|
24
|
25
|
26
|
27
|
28
|
29
|
30
|
1
|
University of Waterloo
200 University Ave W, Waterloo, ON
N2L 3G1
Phone: (519) 888-4567
Staff and Faculty Directory
Contact the Department of Electrical and Computer Engineering
The University of Waterloo acknowledges that much of our work takes place on the traditional territory of the Neutral, Anishinaabeg and Haudenosaunee peoples. Our main campus is situated on the Haldimand Tract, the land granted to the Six Nations that includes six miles on each side of the Grand River. Our active work toward reconciliation takes place across our campuses through research, learning, teaching, and community building, and is centralized within our Office of Indigenous Relations.