The Seshat Global History Databank brings together the most current and
comprehensive body of knowledge about human history in one place.
Our unique Databank systematically collects what is currently known
about the social and political organization of human societies and
how civilizations have evolved over time.
This dataset contains two different social networks: Twitter, a micro-blogging platform with exponential growthand extremely fast dynamics, and Tom’s Hardware, a worldwide forum network focusing on new technology with more conservative dynamics but distinctive features.
467 million Twitter posts from 20 million users covering a 7 month period from June 1 2009 to December 31 2009.
We estimate this is about 20-30% of all public tweets published on Twitter during the particular time frame.
As per request from Twitter the data is no longer available.
The Higgs dataset has been built after monitoring
the spreading processes on Twitter before, during and after
the announcement of the discovery of a new particle
with the features of the elusive Higgs boson on 4th July 2012.
The messages posted in Twitter about this discovery between 1st and 7th July 2012 are considered.
Patent citation networks (these are available and reasonably well annotated)
Wikipedia articles and their references (readily available)
also includes easily-parseable mathematical data and theorems
…and edit trails
…and category annotations
and semantic metadata
probably more data than you can use
source code of large collaborative projects (Linux or BSD kernel,
openoffice, python, Perl, GCC etc)
can I parse such projects to see how interfaces form?
Are there odd stylised facts about contribution to these that I might be
able to explain?
This is possibly a low-hanging fruit for me - I’ve got a fair bit of
experience of parsing, and SCM-wrangling. But is software engineering a good
proxy for physical engineering? Can I parse other technical standards in the
same way? (I know a few engineers - must ask them)
Journal article cross-references. This is over-studied.