Michael Paul | ||
Multi-Faceted Topic Modeling PackageThis software includes implementations of cross-collection latent Dirichlet allocation (ccLDA) and the topic-aspect model (TAM), introduced and described in the papers below, as well as an LDA implementation. See the included README for open licensing information as well as usage instructions and input/output formatting guidelines. TAM and ccLDA fall in the class of multi-faceted topic models which learn topical variation across some other variable such as a document's collection, the author's perspective, a time span, and other possibilities. For example, topics in scientific literature might appear across multiple disciplines, but in different ways in each field, so the topical words in a document would also depend on paper's primary discipline. Topics found in reviews and editorials might be expressed in different ways depending on the author's perspective or sentiment. ccLDA captures these properties with explicit document labels, while TAM tries to learn this other latent dimension automatically. In more recent work we found TAM to be useful for unsupervised viewpoint clustering. This implementation includes somewhat minimal functionality. In particular, it does not provide a method for running inference on new documents, and it does not allow asymmetric topic priors. I may or may not add these things in a future release. (Note: this implementation of ccLDA does not learn an asymmetric alpha matrix as in the original paper. We found that it mostly learned sparse priors and this was not too important. If you desire this functionality, it can be found in this older implementation of ccLDA.) Please contact me if you find any bugs/errors. It may be a good idea to check back every once in a while in case there are future updates, especially in case bugs are discovered. Revision History
References
|