Novelty analysisο
In this notebook, weβll look at the trade-offs between novelty and accuracy in semantic matching.
[1]:
import logging
logging.getLogger().setLevel(logging.INFO)
import takco
import pandas as pd
conf = takco.config.parse('resources/config-dbpedia.toml')
tables = takco.TableSet.load('output/t2d-v2-baseline-2/1-link/*')
t2dv2 = takco.config.build('t2d-v2', conf)
scored_tables = tables.score(t2dv2, keycol_only=True)
scored_tables.tables.persist()
takco.preview( t for t in scored_tables if any(t.get('gold', {}).values()) )
INFO:root:Loading data from resources/t2d_fix.csv
INFO:root:Read 512 tables from data/t2d-v2/tables
INFO:root:Read 512 entity tables from data/t2d-v2/instance
INFO:root:Loaded 514 annotated tables
[1]:
? | 0 | 1 | 2 |
---|---|---|---|
∈ |
|
||
0 |
|
||
Title | Author | Source | |
Adventures of Huckleberry Finn
|
Mark Twain | ALA [11] | |
The Adventures of Super Diaper Baby
|
Dav Pilkey | ALA [47] | |
The Adventures of Tom Sawyer
|
Mark Twain | ALA | |
Alice series
|
Phyllis Reynolds Naylor | ALA [2] | |
All the King's Men
|
Robert Penn Warren | Rad |
(146 more rows)
? | 0 | 1 | 2 |
---|---|---|---|
∈ |
|
||
# | Media | MIX | |
1 |
Dainik Jagran
|
27.500 | |
2 |
Dainik Bhaskar
|
14.000 | |
3 | Aajtak TV | 7.000 | |
4 |
CNN Editions (International)
|
6.000 | |
5 |
Dinakaran
|
5.000 |
(16 more rows)
? | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
∈ |
|
||||||
2 |
|
floorCountπ‘ | openingDateπ‘ | ||||
# | Geb?ude | Geb?ude | Stadt | Etagen | H?he | Jahr | |
1 |
Burj Khalifa
|
Dubai | 163 | 2.717 ft | 2010 | ||
2 |
Makkah Clock Royal Tower [Abraj Al Bait]
|
Mekka | 95 | 1.972 ft | 2012 | ||
3 |
Taipei 101
|
Taipei | 101 | 1.671 ft | 2004 | ||
4 |
Shanghai World Financial Center
|
Shanghai | 101 | 1.614 ft | 2008 | ||
5 |
International Commerce Centre [Union Square]
|
Hong Kong | 118 | 1.588 ft | 2010 |
(195 more rows)
? | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
∈ |
|
||||
1 |
|
||||
Rank | Company | Industry | Temkin Experience Rating (TER) | Company TER vs Industry TER | |
1 |
Sam's Club
|
Retailer | 85% | 13.0 | |
2 |
Publix
|
Grocery Chain | 81% | 4.9 | |
3 | A credit union | Bank | 80% | 14.5 | |
3 |
Chick-fil-A
|
Fast Food Chain | 80% | 6.2 | |
3 |
Subway
|
Fast Food Chain | 80% | 6.4 |
(201 more rows)
? | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
∈ |
|
||||||
PEAK | RANKING | MAP | GUIDE | GRID REF | ALT (ft) | ALT (m) | |
Allen Crags
|
43 | SW | E | NY 236 085 | 2,572 | 784 | |
Angletarn Pikes
|
143 | NE | FE | NY 414 148 | 1,857 | 566 | |
Ard Crags
|
142 | NW | NW | NY 207 197 | 1,860 | 567 | |
Armboth Fell
|
182 | NW | C | NY 297 159 | 1,570 | 479 | |
Arnison Crag
|
194 | NE | E | NY 394 150 | 1,424 | 434 |
(210 more rows)
? | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
∈ |
|
||||
1 | frenchNameπ‘ | regionπ‘ |
|
timeZoneπ‘ | |
A | Nom en anglais | Endroit | Capitale | Heure | |
Afghanistan |
Afghanistan
|
Asie | Kabul | +4.5 | |
Afrique du Sud | South Afrique π‘ | Afrique | Pretoria | +2 | |
Albanie |
Albania
|
Europe | Tirane | +1 | |
Alderney (UK) voir les Anglo-Normandes |
Alderney
|
Europe | 0 | ||
Algrie |
Algeria
|
Afrique | Algiers | +1 |
(228 more rows)
? | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
∈ |
|
||||||
Country Name: | Population | Area (Sq. Km.) | Population Density (Sq. Km.) | Area (Sq. Mi.) | Population Density (Sq. Mi.) | ||
36 |
China
|
1,339,190,000 | 9,596,960.00 | 139.54 | 3,705,405.45 | 361.42 | |
77 |
India
|
1,184,639,000 | 3,287,590.00 | 360.34 | 1,269,345.07 | 933.27 | |
183 |
United States of America
|
309,975,000 | 9,629,091.00 | 32.19 | 3,717,811.29 | 83.38 | |
78 |
Indonesia
|
234,181,400 | 1,919,440.00 | 122.01 | 741,099.62 | 315.99 | |
24 |
Brazil
|
193,364,000 | 8,511,965.00 | 22.72 | 3,286,486.71 | 58.84 |
(188 more rows)
? | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
∈ |
|
|||||
0 |
|
releaseDateπ‘ | releaseDateπ‘ | |||
Title | Publisher | EU Release Date | AU Release Date | PEGI | ACB | |
Donkey Kong Country
|
Nintendo | 2006-12-08 | 2006-12-07 | 7 | G | |
F-Zero
|
Nintendo | 2006-12-08 | 2006-12-07 | 3 | G | |
SimCity
|
Nintendo | 2006-12-29 | 2006-12-29 | 3 | G | |
Super Castlevania IV
|
Konami | 2006-12-29 | 2006-12-29 | 3 | PG | |
Street Fighter II: The World Warrior
|
Capcom | 2007-01-19 | 2007-01-19 | 12 | PG |
(60 more rows)
? | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
∈ |
|
||||
1 | programmeFormatπ‘ |
|
|||
Dial Location | Call Letters | Format | Address | Telephone | |
AM 790 |
KABC (ABC Radio Networks)
|
News/Talk | 3321 S La Cienega Blvd, Los Angeles 90016 | (310) 840-4900 | |
AM 900 |
KALI AM
|
Spanish News/Talk | 747 E Green St, Pasadena 91101 | (626) 844-8882 | |
AM 1300 |
KAZN (Asian Radio)
|
Chinese Variety | 747 E Green St, Pasadena 91101 | (626) 568-1300 | |
AM 1580 |
KBLA
|
Spanish News/Talk | 123 Figueroa St, #101A, Los Angeles 90012 | (213) 628-8700 | |
AM 740 |
KBRT (K-Bright)
|
Religious Talk | 3183-D Airway Ave, Costa Mesa 92626 | (714) 754-4450 |
(25 more rows)
? | 0 | 1 | 2 |
---|---|---|---|
∈ |
|
||
Local Health Boards | Hospital name | Link Surgeons | |
Abertawe Bro Morgannwg University LHB |
Morriston Hospital (Swansea)
|
Roger Morgan | |
Singleton Hospital (Swansea)
|
Roger Morgan | ||
Princess of Wales Hospital (Bridgend)
|
Roger Morgan | ||
Aneurin Bevan LHB |
Neville Hall Hospital (Abergavenny)
|
Richard Blackett | |
Royal Gwent Hospital (Newport)
|
Ahmed Shandall |
(11 more rows)
[21]:
db = takco.config.build('dbpedia_t2ksubset', conf)
novelty_tables = scored_tables.triples().novelty(db)
novelty_tables.tables.persist()
len(list(novelty_tables))
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/U_ _Ur_Hand does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/U_ _Ur_Hand does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/U_ _Ur_Hand does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/U_ _Ur_Hand does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/"Weird_Al"_Yankovic:_The_Videos does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/"Weird_Al"_Yankovic:_The_Videos does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/"Weird_Al"_Yankovic:_The_Videos does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/"Weird_Al"_Yankovic:_The_Videos does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/"Weird_Al"_Yankovic:_The_Videos does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
WARNING:rdflib.term:http://dbpedia.org/resource/Toys_"R"_Us does not look like a valid URI, trying to serialize this will break.
[21]:
235
[22]:
report = novelty_tables.report(keycol_only=True)
display(pd.DataFrame.from_dict(report['scores'], orient='index').style.set_caption('Predictions:'))
def reform_dict(dictionary, t=tuple(), reform={}):
for key, val in dictionary.items():
t = t + (key,)
if isinstance(val, dict) and all(isinstance(v, dict) for v in val.values()):
reform_dict(val, t, reform)
else:
reform.update({t: val})
t = t[:-1]
return reform
display()
pd.DataFrame.from_dict(reform_dict(report['novelty']), orient='index').style.set_caption('Extractions:')
INFO:root:Collected 26104 gold and 24061 pred for task entities
INFO:root:Collected 434 gold and 157 pred for task properties
INFO:root:Collected 235 gold and 235 pred for task classes
precision | recall | f1-score | support | predictions | |
---|---|---|---|---|---|
entities | 0.867670 | 0.799762 | 0.832333 | 26104 | 24061 |
properties | 0.777070 | 0.281106 | 0.412860 | 434 | 157 |
classes | 0.740426 | 0.740426 | 0.740426 | 235 | 235 |
[22]:
tp | fn | fp | precision | recall | f1 | |||
---|---|---|---|---|---|---|---|---|
dbpedia_t2ksubset | label | existing | 9471 | 1027 | 728 | 0.928620 | 0.902172 | 0.915205 |
attnovel | 2549 | 2424 | 1030 | 0.712210 | 0.512568 | 0.596118 | ||
valnovel | 351 | 468 | 349 | 0.501429 | 0.428571 | 0.462146 | ||
class | existing | 5676 | 2190 | 3530 | 0.616554 | 0.721587 | 0.664948 | |
attnovel | 1886 | 1464 | 1332 | 0.586078 | 0.562985 | 0.574300 | ||
valnovel | 62 | 1546 | 432 | 0.125506 | 0.038557 | 0.058991 | ||
property | existing | 3100 | 580 | 994 | 0.757206 | 0.842391 | 0.797530 | |
attnovel | 2858 | 2450 | 2204 | 0.564599 | 0.538433 | 0.551205 | ||
valnovel | 3422 | 412 | 2257 | 0.602571 | 0.892540 | 0.719437 |
[ ]: