Michael J. Paul
Here is an archive of my academic work (see also
Google Scholar
):
Research
Teaching
Group
Resources
Book
Code
S
PRITE
Factorial LDA
Topic modeling for document sequences
Java implementations of the Block HMM and Mixed Membership Markov Model (M4)
Multi-Faceted Topic Modeling Package
Java implementations of Cross-Collection LDA and the Topic-Aspect Model
Carmen: Geolocation for Twitter
Data
Zika Tweets, 2015-2016
Data used in Pruss et al., PLOS ONE 2019.
Tweets for Survey Prediction
Data used in Benton et al., AAAI 2016.
Weibo Air Pollution Dataset
Data used in Wang et al., JMIR 2015.
Health Tweets
Data used in Paul and Dredze, PLOS 2014. Includes health-related tweet IDs and ATAM output.
Doctor Review Dataset
Data used in Wallace et al., JAMIA 2014.
Influenza Twitter Annotations
Data used in Lamb et al., NAACL 2013. Tweet IDs annotated with flu relevance.
Health Twitter Annotations
Data used in Paul and Dredze, ICWSM 2011. Tweet IDs annotated with health relevance.
Cross-Cultural Blog and Forum Dataset
Data used in Paul and Girju, EMNLP 2009.
Other Resources
The code from my group's recent projects can be found on the GitHub pages for
Xiaolei Huang
and
Yoshinari Fujinuma
.
Some annotated tweets are available from the
SMM4H shared tasks
.
See
Mark Dredze's website
for other code/data that I have worked with.