Experiments in Graph-based Semi-Supervised Learning Methods for Class-Instance Acquisition: Datasets

Class-Instance Acquisition Datasets

This is the location for datasets used in the paper

Partha P. Talukdar, Fernando Pereira, Experiments in Graph-based Semi-Supervised Learning Methods for Class-Instance Acquisition, ACL 2010.

For details on the following datasets, please refer to the ACL 2010 paper above. You can use the following citation for this dataset:

 @conference{talukdar2010experiments,
      title={{Experiments in graph-based semi-supervised learning methods for class-instance acquisition}},
      author={Talukdar, P.P. and Pereira, F.},
      booktitle={Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics},
      pages={1473--1481},
      year={2010},
      organization={Association for Computational Linguistics}
 }

Freebase-1 & 2 Graphs

Freebase-1 Graph, Seeds and Evaluation Data
The seeds and evaluation datasets is taken from the gold dataset made available by the authors of:
M. Pennacchiotti and P. Pantel. Entity Extraction via Ensemble Semantics. EMNLP 2009.

Freebase-2 Graph, Seeds and Evaluation Data

Freebase-1 and Freebase-2 graphs are constructed from subsets of the Freebase data. These datasets are made available under the CC-BY license.

TextRunner Graph

TextRunner Graph

The TextRunner Graph is constructed from a subset of TextRunner system's output which is described in detail here. Special thanks to Oren Etzioni and Stephen Soderland for sharing this dataset.

YAGO Graph

YAGO Graph

The YAGO graph is constructed from a subset of the YAGO ontology, and this graph is made available under GNU Free Documentation License.

TextRunner +YAGO Graph

TextRunner+YAGO Graph

Code

The code is now available as part of Junto, the label propagation toolkit. It implements various graph-based semi-supervised learning (SSL) algorithms. Please contact partha {at} talukdar.net if you have any question.
Date: July 7, 2010