DBpedia to Wikidata: Exploring the Linked Jazz Name Directory

Project Title:

DBpedia to Wikidata: Exploring the Linked Jazz Name Directory

Course:

LIS-664 – Program for Cultural Heritage, Fall 2016, Pratt Institute

Project Description:

This project was informed by my involvement in Pratt’s long-running Linked Jazz project, which seeks to explore the relationships and collaborations between jazz musicians by extracting relationship data from oral history transcripts and publishing it as semantically-structured linked data. Linked Jazz recently received a grant from the New Orleans Jazz and Heritage Foundation to create a dataset focused specifically on New Orleans-based jazz musicians. As a test project for this grant, I wanted to explore what familial relationship data about New Orleans-based jazz musicians was available on Wikidata, a popular linked data repository. I also wanted to compare Wikidata to another popular linked data dataset, DBpedia, which was used during the creation of the Linked Jazz dataset in 2011, but which has been increasingly supplanted by Wikidata in the succeeding six years.

Project Documentation

View Project WebsiteDownload Project Presentation

Methods:

I started by downloading the original Linked Jazz Name Directory, available online as an N-Triple file. This directory includes the names of about 9,000 jazz musicians along with their persistent resource identifiers on DBpedia. To find the equivalent identifiers for these musicians on Wikidata, I first had to separate these names out into JSON using a Python script. Because resources on DBpedia (see example) contain links to the identifier for the same person, place, or thing on other linked data repositories, I could then use another Python script to extract the corresponding Wikidata ID for each name in the Linked Jazz Directory. With these names, I used another script to query Wikidata for the place of birth and place of death of each musician. Using even more scripts, I narrowed down the directory to only musicians who had been born in or died in New Orleans. I then attempted to query Wikidata for information about the relatives of these individuals, but was not able to find much data. Because I already had data on where musicians in the Linked Jazz Directory were born and/or had died, I decided to experiment with the Tableau software platform to create a geomapped visualization of these locations.

My Role:

I worked on this project alone, but used scripts created by Molly Reese-Lerner and fellow Linked Jazz team member Hannah Sistrunk for an LIS-664-01 project the previous year, along with scripts created by Linked Jazz project member and METRO Fellow Karen Li-Lun Hwang, as the basis of many of the scripts I used.

Learning Objective Achieved:

Technology

Rationale:

This project was primarily based on Python programming. I also worked with data serialization formats including N-Triples and JSON, semantic web datasets, and data visualization software.

Additional Learning Objective Achieved:

Communication

Rationale:

Using Tableau, I created an interactive visualization of the jazz musician birth and death data I pulled from Wikidata.

Additional Learning Objective Achieved:

Reflective Practice

Rationale:

This project was in part meant to examine Wikidata as a replacement source for DBpedia, and to explore trends in the use of linked data datasets as a whole.