January 31, 2017

0 0
Read Time:2 Minute, 52 Second

Data Refinement

I am continuing to work on cleaning the Whitney’s constituent data in OpenRefine and Google Sheets.

I started looking into Google Fusion Tables, which seems like a good alternative to Tableau as a geocoding application.

Possible source for data cleaning/reconciliation info:

http://freeyourmetadata.org/ 

As per this discussion (https://groups.google.com/forum/#!topic/openrefine/GwCGTM3NGOQ), there is apparently a version of OpenRefine specifically designed for Linked Data:

https://sourceforge.net/projects/lodrefine/?source=navbar

I also borrowed an e-Edition of this book:

http://book.freeyourmetadata.org/

http://search.ebscohost.com.ezproxy.pratt.edu:2048/login.aspx?direct=true&db=nlebk&AN=969817&site=ehost-live

Notes from Hooland, S. v., & Verborgh, R. (2014). Linked Data for Libraries, Archives and Museums : How to Clean, Link and Publish Your Metadata. London: Facet Publishing:

Page 23: The authors recommend one table per entity for a linked data database, as per basic relational database guidelines. In my database, I’m thinking of Events (artwork creation, donation, purchase, etc) as entities. Indeed, in CIDOC, the E5 Event class is a sub-sub-sub-subclass of E1 Entity.

Any CIDOC class could potentially be its own table by this thinking.

My current tables correspond to the following CIDOC Entities:

  • E5 Event
  • E22 Man-Made Object
  • E39 Actor

Subtypes of events could potentially be their own tables:

  • E12 Production
  • E8 Acquisition Event

Role E55 Type could be its own table, as the same actor may have multiple roles in relation to the same object

The Role or Event tables could basically be like associative tables (https://en.wikipedia.org/wiki/Associative_entity)

Since Joshua has artist birth and death dates and location:

  • E67 Birth
  • E69 Death
  • E53 Place

Basically, with CIDOC, I should think of things in terms of Class=Table and Property=Column, to represent things with a relational structure that makes sense.

Or rather, each table is a CIDOC Class (AKA entity). Within tables, each Property is a column. These columns are populated by Classes, unless there exists a many-to-many relationship between the Class entities, in which case they get their own table.

Subject = Table

Predicate = Column

Object = URI or literal

As of the latest release of CIDOC issued this month, the E82 Actor Appellation Class has been deprecated in favor of the generic E41 Appellation

CIDOC property P48 has preferred identifier (is preferred identifier of) should define the primary key of any given table

For now, I am going to focus on making sure all my table rows conform to a CIDOC property

The CIDOC Class Hierarchy: http://cidoc-crm.org/cidoc_graphical_representation_v_5_1/class_hierarchy.html

Database Setup

I decided to join METRO, through which I’m hoping to get access to Lynda.com and take some online courses in database management.

Specifically, I’m interested in this course (https://www.lynda.com/NoSQL-tutorials/NoSQL-SQL-Professionals/368756-2.html), which is focused on NoSQL for people with SQL backgrounds, and which touches specifically on the management of museum data.

There’s also this course as well: (https://www.lynda.com/NoSQL-tutorials/Up-Running-NoSQL-Databases/111598-2.html)

METRO only has 8 licenses, however, and I’m not sure what the turnaround time for these is, so I may just sign up for Lynda independently.

UPDATE – Pratt apparently provides access to Lynda as well:

http://libguides.pratt.edu/lynda 

Happy
Happy
0 %
Sad
Sad
0 %
Excited
Excited
0 %
Sleepy
Sleepy
0 %
Angry
Angry
0 %
Surprise
Surprise
0 %