February 14, 2017

0 0
Read Time:4 Minute, 3 Second

CIDOC Namespaces

Fortunately, CIDOC seems to have finally added back pages for its individual entity classes and properties:

http://www.cidoc-crm.org/Version/version-6.2

Unfortunately, I don’t know if these pages can be considered persistent URIs, since each entity and property page is bookended by “/version-6.2”, and pages will not redirect without it:

http://www.cidoc-crm.org/Entity/e2-temporal-entity/version-6.2

http://www.cidoc-crm.org/Property/p118-overlaps-in-time-with/version-6.2

Still, it is helpful at least in the conceptual mapping of the Whitney’s data to be able to quickly view the properties that can be applied to each entity class, and to view subclasses and superclasses of entities, without having to manually scroll through a huge PDF.

Separating Entities

I finally finished separating data for each CIDOC Entity class with a many-to-many relationship with another into its own table/Google Sheet. I went pretty granular, ending up with 14 different sheets:

E8 Acquisition

Used for non-purchase acquisition events.

E12 Production

Used to represent the creation of an artwork.

E22 Man-Made Object

(this could be problematic if used outside the context of the Founding Collection for newer items that don’t have a physical component)

Used to represent art objects in the Whitney’s collection.

E39 Actor

Used to represent the unique Whitney identifier for any kind of Constituent (Object, Acquisitions, and Ex-Collections-related).

E42 Identifier

Used to record non-Whitney unique identifiers for Constituents and the name authority domains to which they belong (currently LC, VIAF, ULAN, Wikidata; potentially the Smithsonian American Art Museum in the future).

E51 Contact Point

Used to record external URIs for Constituents on name authority sites.

E53 Place

Used to record the names and GeoNames IDs for the birth and death locations of artists. Could also be used for any other place identifiers like object locations.

E54 Dimension

Used to record dimensions of objects. Objects may have more than one listed dimension in TMS (framed vs. unframed, inches vs. cm), hence the need for a separate table.

E55 Type

Used to record role type (artist, donor, art dealer, etc) of an Actor in a given event. Vocabulary for these roles is from the Getty AAT.

E57 Material

Used to record the material components of art objects.

E67 Birth

Used to record Constituent birth dates and locations.

E69 Death

Used to record Constituent death dates and locations. Also includes links to the NYTimes Obituaries Joshua found.

E82 Actor Appellation

(deprecated in the latest version of CIDOC still in development [Version 6.2.2] in favor of the broader E41 Appellation)

Records the Display Names of Constituents along with any alternative forms.

E96 Purchase

(a new Event class being added in CIDOC Version 6.2.2)

Used to record information about the purchase of objects.

I could definitely simplify this conceptual model at a later point, but I figure that since the main audience for this data would be art historical researchers, maybe more specificity would be beneficial.

Neo4j

Neo4j apparently allows you to import CSV files directly and make table joins with their SQL-like query language called Cypher:

https://neo4j.com/developer/guide-importing-data-and-etl/

Given that, I may just import my CSV data directly into Neo4j after cleaning it in OpenRefine rather than bothering to make a MySQL database first. Starting with an ER diagram may be helpful, however.

Neo4j is also designed as more of a “big data” database.

CouchDB

CouchDB is another popular NoSQL system I see mentioned a lot.

CouchDB doesn’t support SPARQL either, however, instead relying on a JSON-based query language

Blazegraph

The Metaphacts database used by the Florentine Renaissance Drawings project is built on top of a Blazegraph database.

Blazegraph is available as an open-source download. It also supports SPAQRL queries.

It does seem somewhat harder to set up than something like Neo4j, however.

Update: I installed Blazegraph easily, and it has a straighforward local interface. However, I’m not really sure what the applicability of it is, if any.

OrientDB

At first glance, OrientDB seems like it might have an easier to use interface. I will download it and give it a shot.

http://orientdb.com/download/

OrientDB took some time to install. You have to install a server and console client locally to use it:

http://orientdb.com/docs/last/

At first glance, OrientDB seems more familiar and usable than previous graph databases I’ve looked at.

You can define Classes, for example (http://orientdb.com/docs/last/)

I’m not really sure if and how OrientDB supports linked data, however, which is obviously an issue

DB Selection – Options

The number of database options is kind of overwhelming, quite frankly.

This is a helpful overview:

http://db-engines.com/en/ranking/rdf+store

Happy
Happy
0 %
Sad
Sad
0 %
Excited
Excited
0 %
Sleepy
Sleepy
0 %
Angry
Angry
0 %
Surprise
Surprise
0 %