Waking and Sleeping

“Good morning” and “good night” tweets, mapped in real-time. A project by Katherine Yang, open-sourced on GitHub.

Created during an intensely virtual, remote, and global time, this project is a reflection on world interconnectedness and our daily waves and flares of activity and wakedness across the globe. The hope is that you could open the webpage at any time during your day/night and see where in the world people are waking up and going to sleep. Inspired largely by Lin-Manuel Miranda’s series of “gmorning, gnight” tweets and his book with Jonny Sun, Gmorning, Gnight!: Little Pep Talks for Me & You :)

Photo of a spread in Gmorning, Gnight!
Photo of a spread in Gmorning, Gnight! The New York Public Library Shop

On geotagging

This project relies on tweets that are geotagged, either with exact coordinates or with a “place”. Coordinates, the most granular location type (as specific as a street intersection), are given by the phone’s GPS as the place where someone is when they send the tweet. Place, a coarser location type, is a named location (business, city, country) around someone that they can choose before sending the tweet.

Geotagging is an incredible feature for researchers and data visualisers (not to mention, of course, business and advertisers). The “big data” method of data collection utilised prevalently on so many platforms—“collect anything and everything that might be remotely insightful”—allows a researcher to see detailed geospatial trends that are happening *right this moment* with just a few lines of code, whereas a pre-internet researcher may have had to ride on a bicycle around a region for years to be able to amass a large-ish amount of data.

However, while just over 3.5% of tweets were geotagged in 2012 (three years after geotagging was introduced on Twitter), that percentage has dropped steadily to around 1.5% of tweets since 2017 (Kalev Leetaru, Forbes—fascinating charts, by the way, if you’re interested!). Our decreasing trust and interest in geotagging hints at the deep concerns we have about privacy today. There’s a lack of transparency (we don’t know when our location is being collected, how often, and how detailed) and a lack of agency (invisible defaults and slippery opt-outs instead of meaningful opt-ins can overwhelm and blindside).

Percent of tweets in the Twitter 1% stream that were geotagged (GPS + Place)
Percent of tweets in the Twitter 1% stream that were geotagged (GPS + Place). Kalev Leetaru, Forbes

Is this project an ethical use of the data I’m able to access through Twitter? Without knowing whether all of the people are aware of and consenting to being geotagged (and rather suspecting many aren’t), it’s hard to say yes, but hopefully only showing truncated tweets without any personally revealing information provides enough anonymity for this to pass for the moment.

Some limitations

Owing to the limitations of both the standard plan and Twitter’s own data parsing, there are a couple major factors that make the project less comprehensive and accurate than hoped.

Non-space separated languages, such as CJK are currently unsupported. —Standard stream parameters

This project uses Twitter’s Streaming API, with the “statuses/filter” endpoint. This means that a connection is constantly open between the app and Twitter, through which Twitter sends tweets in realtime that match with the keywords I specify in the “track” parameter. The explanation hints at spaces being a challenging technical hurdle—Twitter’s algorithm for breaking a tweet down into keywords doesn’t know what to do with a tweet written in languages like Chinese, Japanese, or Korean, which may lack word-separating spaces. Unfortunately, by multiple metrics, this leaves out a large population of people and internet users. On total number of speakers: combined varieties of Chinese bring it to 1st on the list, Japanese is 13th, and Korean is 22nd (2019 Ethnologue). On languages on the internet: Chinese is 2nd and Japanese is 8th (Statista). On languages on Twitter: Japanese is 2nd, Thai (another non-space separated language) is 9th, and Korean is 10th (Statista via Mashable).

Chart: Only 34% of All Tweets are in English
Chart: Only 34% of All Tweets are in English. Statista via Mashable
Exact matching of phrases (equivalent to quoted phrases in most search engines) is not supported. —Standard stream parameters

Twitter matches a tweet as long as the words in one of the key phrases exists in the tweet, regardless of order. This means that a key phrase “good night” might match with a tweet posted in the morning that says, “last night was good”, resulting in some confusing or self-contradicting points in the viz.

Resulting musings