The US Navy is seeking to create an archive of at least 350 billion social media posts from around the world, in order to study how people talk online.
The military project team has not specified which social media platform it intends to collect the data from.
The posts must be publicly available, come from at least 100 different countries and include at least 60 different languages.
They should also date between 2014 and 2016.
The details were revealed in a tender document from the Naval Postgraduate School for a firm to provide the data.
Applications have now closed.
Additional requirements included:
- the posts must come from at least 200 billion unique users
- no more than 30% can come from a particular country
- at least 50% must be in a language other than English
- location information must be included in at least 20% of the records
Private messaging and user information will not form part of the database.
“Social media data allows us for the first time, to measure how colloquial expressions and slang evolve over time, across a diverse array of human societies, so that we can begin to understand how and why communities come to be formed around certain forms of discourse rather than others,” T Camber Warren, the project’s lead researcher, told Bloomberg.
The US Navy was behind the creation of Tor, the anonymous browsing network, in 2002.
Tor, also known as The Onion Router, aims to conceal where people go online by using encryption and randomly bouncing requests for web pages through a network of different computers.