WWW 2014 Tutorial: Social Spam, Campaigns, Misinformation and Crowdturfing
Tutorial slides are here.
Presenters
Kyumin Lee, Assistant Professor, Department of Computer Science, Utah State University, kyumin.lee [at] usu.edu
James Caverlee, Associate Professor, Department of Computer Science and Engineering, Texas A&M University, caverlee [at] cse.tamu.edu
Calton Pu, Professor, School of Computer Science, Georgia Institute of Technology, calton [at] cc.gatech.edu
Topic and Description
The past few years have seen the rapid rise of many successful social systems – from Web-based social networks (e.g., Facebook, LinkedIn) to online social media sites (e.g., Twitter, YouTube) to large-scale information sharing communities (e.g., reddit, Yahoo! Answers) to crowd-based funding services (e.g., Kickstarter, IndieGoGo) to Web-scale crowdsourcing systems (e.g., Amazon MTurk, Crowdflower).
However, with this success has come a commensurate wave of new threats, including bot-controlled accounts in social media systems for disseminating malware and commercial spam messages, adversarial propaganda campaigns designed to sway public opinion, collective attention spam targeting popular topics and memes, and propagate manipulated contents.
This tutorial will introduce peer-reviewed research work on information quality on social systems. Specifically, we will address new threats such as social spam, campaigns, misinformation and crowdturfing, and overview modern techniques to improve information quality by revealing and detecting malicious participants (e.g., social spammers, content polluters and crowdturfers) and low quality contents. In addition, this tutorial will overview tools to detect these participants.
Structure of the Proposed Tutorial (Half day)
1. Introduction to Social Spam, Campaigns, Misinformation and Crowdturfing (15min)
o Overview of this tutorial
o What are social spam, campaigns, misinformation and crowdturfing? Show real examples of them.
o Why social spam is different from traditional spam such as email and web spam? Examples are:
o Openness. Anyone can create an social account. Easy to contact other users.
o URL blacklists are too slow at identifying new threats, allowing more than 90% of visitors to view a page before it becomes blacklisted [1].
o URL shortening services for obfuscation.
o Automatically control bots by using APIs.
2. State-of-the-Art in Research on the Threats and Defenses
2.1. Social Spam (45min)
In this session, we will overview various social spam detection approaches:
o How to detect suspicious URLs [2].
o Social capitalists have contributed for spammers to become well established users and get social signals even though they have spread spam over social networks. We will answer why this happened and how to give penalty to not only spammers but also these social capitalists [3].
o Supervised spam detection approach is the most popular approaches to detect social spammers or spam messages. Examples of social spam detected by the classification approach are YouTube video spam [4], Twitter spam [5], Foursquare spam tips [6] and collective attention spam [10].
o Social Honeypot was proposed to monitor spammers' behaviors and collect their information [7].
o Using the crowd wisdom to identify social spammers [8].
o Unsupervised social spam detection approach [9].
2.2 Campaigns (40min)
We will introduce how these malicious participants form groups and run campaigns to target social systems more effectively, and overview campaign detection approaches:
o Graph-based social spam campaign detection [11].
o Content-driven campaign detection [12][13].
o Detect and track political campaigns in social media by using a classification approach [14].
o Frequent itemset mining method with behavioral models to detect fake reviewer groups [15].
2.3. Misinformation (30min)
Can we trust information generated on social systems? This session will introduce what kind of misinformation exist on social systems, and survey possible approaches to detect the misinformation:
o Measure information credibility on social media by using classification approaches with the crowd power [16].
o Automatic rumor detection approach on Sina Weibo, China's leading micro-blogging service provider [17].
o Identify fake images on Twitter during Hurricane Sandy [18].
o Methods for the information credibility in emergency situation. The methods consist of an unsupervised approach and a supervised approach to detect message credibility [19].
2.4. Crowdturfing (35min)
Recently, malicious participants have started to take advantage of the crowd power to spread manipulated information over social systems. This session will overview real examples of weaponizing crowdsourcing and techniques to identify these manipulated contents and crowd workers who spread manipulated contents on behalf of requesters:
o Introduce real examples reported by the news media.
o Understand what kind of crowdturfing tasks are available on crowdsourcing sites [20][21].
o Understand a crowdturfing market size in both eastern and western crowdsourcing sites [21][22].
o Track and reveal crowdsourced manipulation of social media. Especially, focus on the western crowdsourcing sites and overview how to detect crowdturfers on social media [21].
3. Challenges, Opportunities and Tools in Social Spam, Campaigns, Misinformation and Crowdturfing Research (15min)
o Review of open research challenges: need for large, accurate, up-to-date data sets, integration of multiple techniques and areas.
o Data management challenge: user protection in ethics, privacy and related areas of public data sets
o Introduce useful tools for conducting research in the area:
o Big data analysis (e.g., MapReduce, Pig, Hive)
o Machine learning (e.g., Weka, Mallet)
o Visualization (e.g., Matplotlib, Graphviz)
References
[1] Grier, C., Thomas, K., Paxson, V., and Zhang, M. @spam: the underground on 140 characters or less. In CCS, 2010.
[2] Lee, S., and Kim, J. WarningBird: Detecting suspicious URLs in Twitter stream. In NDSS, 2012.
[6] Aggarwal, A., Almeida, J., and Kumaraguru, P. Detection of spam tipping behaviour on foursquare. In WWW Companion, 2013.
[16] Castillo, C., Mendoza, M., and Poblete, B. Information credibility on twitter. In WWW, 2011.
[17] Yang, F., Liu, Y., Yu, X., and Yang, M. Automatic detection of rumor on Sina Weibo. In SIGKDD Workshop on Mining Data Semantics, 2012.