MCLC: when and why tweets are deleted

Denton, Kirk denton.2 at osu.edu
Thu May 31 08:35:47 EDT 2012


MCLC LIST
From: Anne Henochowicz <annemh at alumni.upenn.edu>
Subject: when and why tweets are deleted
***********************************************************

Source: Nieman Journalism Lab (5/30/12):
http://www.niemanlab.org/2012/05/reverse-engineering-chinese-censorship-whe
n-and-why-are-controversial-tweets-deleted/

Reverse engineering Chinese censorship: When and why are controversial
tweets deleted? 
By Andrew Phelps <http://www.niemanlab.org/author/aphelps/>

An MIT student is working to detect patterns in the disappearance of
thousands of weibos from the Chinese Internet.

Censoring the Chinese Internet must be exhausting work, like trying to
stem the flow of a fire hose with your thumb. Sina Weibo, a popular
Twitter-like service,says
<http://tech.sina.com.cn/i/2012-05-15/12307109653.shtml> its 300 million
registered users post more than 100 million weibos, or tweet-like posts, a
day. (In Chinese, weibo means microblog or microblog post.)

And of course the entire Chinese Internet isn¹t as censored
<http://www.ethanzuckerman.com/blog/2011/12/28/exploring-the-chinese-intern
et-with-weiboscope/> as some might think. So why are some tweets deleted,
not others? Which topics are seen as the biggest threat to harmony?

Chi-Chu Tschang <https://twitter.com/#!/tschang> wants to unwrap the black
box. Tschang is an MBA student at MIT¹s Sloan School and former
China-based correspondent for BusinessWeek and a student in Ethan
Zuckerman¹s class this semester, ³News in the Age of Participatory Media
<http://partnews.mit.edu/>.² For his final project, Tschang built on data
collected on thousands of deleted weibos in China to look for answers. (I
summarized some other interesting ideas from students
<http://www.niemanlab.org/2012/05/3-new-ideas-on-the-future-of-news-from-mi
t-media-lab-students/> in a previous post.)

³We know that certain topics are censored from blogs hosted in China,
Chinese search engines and Weibos,² Tschang writes in his paper. ³But we
don¹t know where the line lies. Part of the reason is because the line is
constantly moving.²

Tschang drew on the work of researchers at the University of Hong Kong¹s
Journalism and Media Studies Center <http://jmsc.hku.hk/>. Cedric Sam and
King-wa Fu helped buildWeiboScope
<http://research.jmsc.hku.hk/social/obs.py/sinaweibo/>, which visualizes
the most popular content on Sina Weibo in something close to real time. On
top of that app, they built WeiboScope Search
<http://research.jmsc.hku.hk/social/search.py/sinaweibo/#lastpermissiondeni
ed>, which includes deleted weibos ‹ more than 12,000 since Feb. 1 ‹ in
its huge archive.

Using the data visualization software Tableau
<http://www.tableausoftware.com/>, Tschang plotted those deleted weibos on
a timeline, then superimposed politically sensitive events to provide
context. (Click to enlarge.)
 <http://www.niemanlab.org/images/censored_weibo_timeline1.png>
The day that saw the highest volume of deletions, in a dataset covering
Feb. 1 to May 20, was March 8: the day rumors of Bo Xilai¹s fall from
power began to spread. Bo <http://en.wikipedia.org/wiki/Bo_Xilai> was a
high-ranking party secretary who was under scrutiny for, among other
things, his tremendous apparent wealth. Bo¹s son, studying here at
Harvard, attracted a lot of attention when he reportedly picked up Jon
Huntsman¹s daughter in a red Ferrari
<http://www.nytimes.com/2012/04/26/world/asia/bo-guagua-tries-to-defuse-spo
rts-car-scandal.html> for a date.

The second-busiest censorship day was March 15, the day Bo was sacked.

Here¹s one more interesting data point: On March 18, word spread of a
deadly car accident involving a Ferrari (a black one, not a red one).
Nearly all information about the crash disappeared from the Internet
<http://www.theatlantic.com/international/archive/2012/03/an-astounding-art
icle-in-global-times/254762/>, fueling speculation about who was involved.
Even the word ³Ferrari² was censored. Tschang observed moderate deletion
activity that day on Sina Weibo.

There is one day of missing data: April 22, the day civil-rights activist
Chen Guangcheng escaped from his house arrest
<http://www.aljazeera.com/news/asia-pacific/2012/04/20124294064187.html>
in Shangdong. Why? An error message dated April 23, the day after, reports
³load problems² that temporarily disabled data collection ‹ disappointing
timing. It could be that the Chinese Weibosphere was so jammed on that
momentous day that the servers were crashing. Or it could be something
else entirely. (Reader Samuel Wade <http://twitter.com/samuel_wade> notes
that news of Chen¹s escape was not widely known until days later
<http://chinadigitaltimes.net/2012/04/activists-chen-guangcheng-flees-house
-arrest/>.)

Tschang crunched the raw data and generated a word cloud, to see which
terms in deleted weibos appear most often.
 <http://www.niemanlab.org/images/top-73-censored-weibo.jpg>
Word clouds, though pretty, don¹t provide a whole lot of context. Tschang
said he wants to examine the list more carefully, filtering out words like
the Chinese equivalents of ³RT² and ³ha ha.² He also wants to examine the
relationships of the 3,500 most censored Weibo users, creating, I don¹t
know, a Klout for civil disobedience?

Tschang¹s hypothesis ‹ that Sina Weibo deletions correlate highly with
spikes in media coverage of sensitive stories ‹ are consistent with the
findings of a similar study
<http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/394
3/3169> from researchers at Carnegie Mellon University, who evaluated 56
millionweibos, of which about 16 percent were deleted.

Those researchers found some key words were far more likely to get a weibo
deleted: Ministry of Truth, Falun Gong, Ai Weiwei, Playboy, to name a few.
³By revealing the variation that occurs in censorship both in response to
current events and in different geographical areas,² the researchers
wrote, ³this work has the potential to actively monitor the state of
social media censorship in China as it dynamically changes over time.²

Finally, Tschang also evaluated how long it took for deleted weibos to be
deleted. He wrote:

The fastest a post was deleted on Sina Weibo was just over 4 minutes. The
longest time it took for the censor to get around deleting a message on
Sina Weibo was over four months. For the posts created on May 20, 2012 and
deleted on the same day, it took on average 11 hours for Weibo Scope
Search to detect the deletion.

Tschang said he suspects some weibos get deleted months later because they
are about topics that suddenly re-surface in Chinese media.

Tschang even tried posting spare, scandalous messages to his own Sina
Weibo account <http://www.weibo.com/u/2470768677>, just to see what would
happen.

* Chen Guangcheng
* Bo Xilai
* Taiwanese independence

Here¹s Tschang:

Less than 14 hours later, I received a message from Sina Weibo¹s system
administrator informing me that my two posts on ³Chen Guangcheng² were
³inappropriate² and had been censored. While I can still see the two ³Chen
Guangcheng² posts on my Sina Weibo account page, no one else can.
Surprisingly, my posts on ³Bo Xilai² and ³Taiwan independence² were not
censored.

One caveat: Tschang cannot be 100 percent sure that a deleted weibo wasn¹t
deleted by its creator, rather than Sina¹s ³monitoring editors.² But Sina
Weibo¹s API makes a helpful distinction in the way it returns data for
deleted weibos. The error message for a non-existent weibo comes back as
either ³Weibo does not exist² or ³Permission denied.² So one could assume,
as do Tschang and the HKU researchers, that ³permission denied² equals
³censored.²

And the best time to weibo something politically sensitive in China? After
11 o¹clock on a Friday night, according to the data.
³Interestingly, deletion of Sina Weibo messages tend to hit a low on
Saturdays,² Tschang wrote. ³I¹m not too sure why that is, except that
maybe censors want to take time off on weekends as well.²


 




More information about the MCLC mailing list