Design

On this page, we dive into the design of the takco clustering and linking pipeline.

Pipeline

Schematic overview of our pipeline. In Phase 1 (a-c), we process the set of all Wikipedia tables to clean up editorial structures. In Phase 2 (d-g), we cluster them to form larger union tables. In Phase 3 (h-j), we integrate them with Wikidata and extract binary and n-ary facts.

Extract

Reshape

Cluster