WWW 2014 Tutorial: Social Spam, Campaigns, Misinformation and Crowdturfing

Tutorial slides are here.

Presenters

Kyumin Lee, Assistant Professor, Department of Computer Science, Utah State University, kyumin.lee [at] usu.edu

James Caverlee, Associate Professor, Department of Computer Science and Engineering, Texas A&M University, caverlee [at] cse.tamu.edu

Calton Pu, Professor, School of Computer Science, Georgia Institute of Technology, calton [at] cc.gatech.edu

Topic and Description

The past few years have seen the rapid rise of many successful social systems – from Web-based social networks (e.g., Facebook, LinkedIn) to online social media sites (e.g., Twitter, YouTube) to large-scale information sharing communities (e.g., reddit, Yahoo! Answers) to crowd-based funding services (e.g., Kickstarter, IndieGoGo) to Web-scale crowdsourcing systems (e.g., Amazon MTurk, Crowdflower).

However, with this success has come a commensurate wave of new threats, including bot-controlled accounts in social media systems for disseminating malware and commercial spam messages, adversarial propaganda campaigns designed to sway public opinion, collective attention spam targeting popular topics and memes, and propagate manipulated contents.

This tutorial will introduce peer-reviewed research work on information quality on social systems. Specifically, we will address new threats such as social spam, campaigns, misinformation and crowdturfing, and overview modern techniques to improve information quality by revealing and detecting malicious participants (e.g., social spammers, content polluters and crowdturfers) and low quality contents. In addition, this tutorial will overview tools to detect these participants.

Structure of the Proposed Tutorial (Half day)

1. Introduction to Social Spam, Campaigns, Misinformation and Crowdturfing (15min)

o   Overview of this tutorial

o   What are social spam, campaigns, misinformation and crowdturfing? Show real examples of them.

o   Why social spam is different from traditional spam such as email and web spam? Examples are:

o   Openness. Anyone can create an social account. Easy to contact other users.

o   URL blacklists are too slow at identifying new threats, allowing more than 90% of visitors to view a page before it becomes blacklisted [1].

o   URL shortening services for obfuscation.

o   Automatically control bots by using APIs.

2. State-of-the-Art in Research on the Threats and Defenses

2.1. Social Spam (45min)

In this session, we will overview various social spam detection approaches:

o   How to detect suspicious URLs [2].

o   Social capitalists have contributed for spammers to become well established users and get social signals even though they have spread spam over social networks. We will answer why this happened and how to give penalty to not only spammers but also these social capitalists [3].

o   Supervised spam detection approach is the most popular approaches to detect social spammers or spam messages. Examples of social spam detected by the classification approach are YouTube video spam  [4], Twitter spam [5], Foursquare spam tips [6] and collective attention spam [10].

o   Social Honeypot was proposed to monitor spammers' behaviors and collect their information [7].

o   Using the crowd wisdom to identify social spammers [8].

o   Unsupervised social spam detection approach [9].

2.2 Campaigns (40min)

We will introduce how these malicious participants form groups and run campaigns to target social systems more effectively, and overview campaign detection approaches:

o   Graph-based social spam campaign detection [11].

o   Content-driven campaign detection [12][13].

o   Detect and track political campaigns in social media by using a classification approach [14].

o   Frequent itemset mining method with behavioral models to detect fake reviewer groups [15].

2.3. Misinformation (30min)

Can we trust information generated on social systems? This session will introduce what kind of misinformation exist on social systems, and survey possible approaches to detect the misinformation:

o   Measure information credibility on social media by using classification approaches with the crowd power [16].

o   Automatic rumor detection approach on Sina Weibo, China's leading micro-blogging service provider [17].

o   Identify fake images on Twitter during Hurricane Sandy [18].

o   Methods for the information credibility in emergency situation. The methods consist of an unsupervised approach and a supervised approach to detect message credibility [19].

2.4. Crowdturfing (35min)

Recently, malicious participants have started to take advantage of the crowd power to spread manipulated information over social systems. This session will overview real examples of weaponizing crowdsourcing and techniques to identify these manipulated contents and crowd workers who spread manipulated contents on behalf of requesters:

o   Introduce real examples reported by the news media.

o   Understand what kind of crowdturfing tasks are available on crowdsourcing sites [20][21].

o   Understand a crowdturfing market size in both eastern and western crowdsourcing sites [21][22].

o   Track and reveal crowdsourced manipulation of social media. Especially, focus on the western crowdsourcing sites and overview how to detect crowdturfers on social media [21].

3. Challenges, Opportunities and Tools in Social Spam, Campaigns, Misinformation and Crowdturfing Research (15min)

o   Review of open research challenges: need for large, accurate, up-to-date data sets, integration of multiple techniques and areas.

o   Data management challenge: user protection in ethics, privacy and related areas of public data sets

o   Introduce useful tools for conducting research in the area:

o   Big data analysis (e.g., MapReduce, Pig, Hive)

o   Machine learning (e.g., Weka, Mallet)

o   Visualization (e.g., Matplotlib, Graphviz)

References

[1]     Grier, C., Thomas, K., Paxson, V., and Zhang, M. @spam: the underground on 140 characters or less. In CCS, 2010.

[2]     Lee, S., and Kim, J. WarningBird: Detecting suspicious URLs in Twitter stream. In NDSS, 2012.

[3]     Ghosh, S., Viswanath, B., Kooti, F., Sharma, N. K., Korlam, G., Benevenuto, F., Ganguly, N., and Gummadi, P. K. Understanding and combating link farming in the twitter social network. In WWW, 2012.

[4]     Benevenuto, F., Rodrigues T., Almeida V., Almeida, J., and Gonçalves, M. Detecting spammers and content promoters in online video social networks. In SIGIR, 2009.

[5]     Lee, K., Eoff, B., and Caverlee, J. Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter. In ICWSM, 2011.

[6]     Aggarwal, A., Almeida, J., and Kumaraguru, P. Detection of spam tipping behaviour on foursquare. In WWW Companion, 2013.

[7]     Lee., K., Caverlee., J., and Webb, S. Uncovering Social Spammers: Social Honeypots + Machine Learning. In SIGIR, 2010.

[8]     Wang, G., Mohanlal, M., Wilson, C., Wang, X., Metzger, M. J., Zheng, H., and Zhao, B. Y. Social Turing Tests: Crowdsourcing Sybil Detection. In NDSS, 2013.

[9]     Tan, E., Guo, L., Chen, S., Zhang, X., and Zhao, Y. UNIK: Unsupervised Social Network Spam Detection. In CIKM, 2013

[10] Lee, K., Kamath, K., and Caverlee, J. Combating Threats to Collective Attention in Social Media: An Evaluation. In ICWSM, 2013.

[11] Gao, H., Hu J., Wilson, C., Li, Z., Chen, Y., and Zhao, B. Detecting and characterizing social spam campaigns. In IMC, 2010.

[12] Lee, K., Caverlee, J., Cheng,  Z., and Sui, D. Content-Driven Detection of Campaigns in Social Media. In CIKM, 2011

[13] Lee, K., Caverlee, J., Cheng,  Z., and Sui, D. Campaign Extraction from Social Media. In ACM TIST, Vol. 5, No. 1, January 2014.

[14] Ratkiewicz, J., Conover, M., Meiss, M., Gonçalves, B., Flammini, A., and Menczer, F. Detecting and Tracking Political Abuse in Social Media. In ICWSM, 2011.

[15] Mukherjee, A., Liu, B., and Glance, N. Spotting fake reviewer groups in consumer reviews. In WWW, 2012.

[16] Castillo, C., Mendoza, M., and Poblete, B. Information credibility on twitter. In WWW, 2011.

[17] Yang, F., Liu, Y., Yu, X., and Yang, M. Automatic detection of rumor on Sina Weibo. In SIGKDD Workshop on Mining Data Semantics, 2012.

[18] Gupta, A., Lamba, H., Kumaraguru, P., and Joshi, A. Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy. In WWW Companion, 2013.

[19] Xia, X., Yang, X., Wu, C., Li, S., and Bao, L. Information credibility on twitter in emergency situation. In Proceedings of the 2012 Pacific Asia conference on Intelligence and Security Informatics (PAISI), 2012.

[20] Motoyama, M., McCoy, D., Levchenko, K., Savage, S., and Voelker, G. M. Dirty jobs: the role of freelance labor in web service abuse. In Proceedings of the 20th USENIX conference on Security (SEC), 2011.

[21] Lee, K., Tamilarasan, P., and Caverlee, J. Crowdturfers, Campaigns, and Social Media: Tracking and Revealing Crowdsourced Manipulation of Social Media. In ICWSM, 2013.

[22] Wang, G., Wilson, C., Zhao, X., Zhu, Y., Mohanlal, M., Zheng, H., and Zhao, B. Y. Zhao. Serf and turf: crowdturfing for fun and profit. In WWW, 2012.