Workspaces:1:Literature Review

From IKMEmergent

Jump to:navigation, search

Contents

Brief Literature Review: Linked Open Data  

This wiki page is a work-in-progress brief literature review looking at Linked Open Data in Development. It will be developed over September and October 2010.


Understanding linked and open data

For the purpose of understanding linked and open data, we can understand data as information that has been recorded or encoded as discrete facts, generally according to some uniform standard. Organisations and projects generate vast quantities of data in their day-to-day work, from figures entered into spreadsheets or meta-data descriptions of video recordings, through to administrative data such as travel records of staff or operational data on the volume of enquiries relating to a particular project. 

This data is often held internally only for use by that organisation, and the way it is recorded makes use of internal labels and standards. For example, the column headings in a spreadsheet might be labels that only really make sense to someone from the same organisation, or the author field in the meta-data for a video recording might allow for free-text to be entered leading to the same video-maker being known by many different labels (e.g. Michael Powell; Mike Powell; Mike Powel; and so-on). Linked data provides a set of conventions for recording data based upon using URIs (Uniform Resource Indicators) as the identifiers for things (and relationships) within a dataset.



Berners-Lee's Linked Data design principles.


The five stars of open, linked data.



Semantic Web

Aldo de Moor distinguishes between the Syntactic, Semantic and Pragmatic web [1].


Drivers for linked open data initiatives

The Panton Principles for open science data do not at present (FAQ, September 2010) apply to social science data, suggesting different principles and norms may be appropriate for opening publishing of social-statistics and qualitative research data. 

Open Research

A range of open research projects are available.



Creating and using linked open data: encoding and interpreting

Halb et. al (2008) note that many linked data efforts focus on a 'machine-first' approach to publishing data, offering only machine-readable data and using template-based interlinking algorithms to make large numbers of links between datasets. However, they argue that this approach leads to links with limited "semantic strength", and they prefer an approach of publishing linked data in both human-readable and machine-readable formats, with a Wiki-based approach to allowing human users of dataset to suggest links between the data they are viewing and other datasets.

Halb et. al (2008; §6) note that modelling can involve interpretation of the dataset only possible after reading the full documentation. As they explain "the raw data from Eurostat is sometimes ambiguous and can only be resolved by analysing the corresponding docu- ment. For example the statement time\2007 can stand for the value over a period of time (e.g. entire year) or at the end of the reporting period (e.g. 31 Dec)."



Often choices about data representation are affected by what makes for efficient parsing of the resulting triples.

Linked open data eco-systems

The Linked Open Data (LOD) Cloud diagram (now generated from CKAN based on user-submitted details) attempts to show visually the linked open data sources currently available, along with their interlinkages. The Wikipedia extract 'dbpedia' plays a pivotal role in many interlinking efforts. It has been suggested (Auer et. al. 2007; Bizer et. al. 2009) that dbpedia is a key 'nucleus' for a web of open data - providing a 'bottom up' alternative to top-down schema-setting efforts to build a semantic web. As of March 2010 dbpedia includes extracts in 11 languages with varying numbers of 'abstracts' (short/long descriptions of things) available in each language: English (3,144,000), German (503,000), French (545,000), Polish (430,000), Dutch (392,000), Italian (381,000), Spanish (362,000), Japanese (275,000), Portuguese (367,000), Swedish (213,000), Chinese (179,000). Extracts appear, however, to be predominantly based on the English language version of wikipedia. Wordnet also plays an important role in the extraction of information for Wikipedia and the Yago knowledge base.


Data Enrichment Services (E.g. TSO have built a data enrichment service for government http://gov.tso.gov.uk (appears unavailable from outside gov)) which can take press releases and add URIs, annotation etc.)

Open data in development

See Development Data Search.



Linked data create 'global variables' in a truly global form.


Critical Questions


Technical Notes

Distinctions

The following need to be distinguished...

OWL Schema

RDF Schema


(Draft) An ontology contains knowledge, whereas a schema describes how knowledge should be recorded and represented.


See [4] [5] etc.

Bibliography

Navigation
Browse By
Languages
Toolbox