Whitney Linked Open Data Fellowship Journal Archives

Main

by amanda

Read Time:1 Second

Something is wrong.

Instagram token error.

mollie_ech

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

Possible Directions for the Project

As far as the direction of the project, after reading Joshua’s documentation of his work this past year, I’m thinking I might like to focus on enriching data on collection objects or creating relationships between them, unless you think it would be better to continue Joshua’s focus on the museum’s artists. Given the limited scope of the project to 1931-48, I assume this would mean objects either created or acquired during this time?

I recently read an article on projects created by SVA’s MFA in Visual Narrative program based on mapping collections at the Metropolitan Museum (http://hyperallergic.com/314638/interactive-maps-of-the-metropolitan-museum-offer-fresh-views-of-its-permanent-collections/). Students created interactive maps in part based on relationships between objects in the collection; for instance, indigo objects, or sculptures depicting the female form. I’m thinking it might be interesting to similarly map or create some kind of interactive interface connecting objects in the Whitney collection based on shared traits or themes. This might be done through information collected from field in TMS like Medium, or enrichment from some outside source.

Joshua mentioned expanding on projects by the Smithsonian American Art Museum or the Yale Center for British Art as a possible future direction for the project. The Smithsonian website discusses their use of the CIDOC Conceptual Reference Model ontology to map relationships between objects in their collection (http://americanart.si.edu/collections/search/lod/about/). In reading about CIDOC CRM, I came across a project called ResearchSpace being developed by the British Museum (http://www.researchspace.org/). Their Semantic Search component integrates data from Wikidata to allow users to search for objects based on the relationships of entities to one another. Joshua mentions using data from Wikidata and the Art and Architecture Thesaurus to enrich the data he pulled from TMS, and I wonder how this enriched data could potentially be used to create links between objects.

Joshua also mentions provenance and/or exhibition data as another possible area for further work. Another project direction I’m considering would involve linking or mapping objects to other institutions where they were previously shown. I did a lot of research on object provenance during my former job working at an art gallery, and I wonder if and how it might be possible to make this process easier and more intuitive for the museum’s researchers. It might also be interesting to enrich the museum’s data on the exhibition history of its artists using data from outside sources. Or, alternately, to add places where artists worked or exhibited to Joshua’s existing map of places of birth and death.

Possible Projects

Exhibition History Focus

Using TMS search, identify all objects acquired between 1931-1948

Generate report on exhibition history of each pieces

Connect these places to entries on Wikidata. Could also refer to Artsy API.

Collect images of objects from TMS

Create script to link objects to institutional data

Connecting Objects/Enriching Object Data

Could chart artistic media/artists represented in 1931-48 acquisitions in the vein of Oliver Roeder’s FiveThirtyEight article using MoMA API data.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

August 26, 2016

by amanda

Read Time:3 Minute, 6 Second

Exploration of Past Project Work and Possible Entities

Having started Matt Miller’s Program for Cultural Heritage class, I looked at a few of the past museum and art-focused projects done by students in the class to get some ideas. One interesting project was Carlos Acevedo’s DBO: Influence project focused on the influence property in DBpedia’s ontology as relates to contemporary artists. I wasn’t able to access Carlos’ final csv file or the visualizations he created in Gephi, but exploring a relationship property like influence seems interesting. I noticed Wikidata has “student of”, “educated at”, and “notable work” properties for some artists, for instance.

Another Program for Cultural Heritage project I looked at was Analyzing Modernism: Modern and Contemporary Painting and Sculpture at The Metropolitan Museum of Art. This project seems to have strictly used data from the Met’s collection site, but it has some interesting ideas for how to visualize collection data. Tableau and TimelineJS might be tools to explore in more depth at some point.

In order to narrow down some possible entities to focus on from the Whitney’s founding collection, I began browsing in TMS to see where objects in the collection were purchased from. Many objects in the Whitney’s founding collection were purchased directly from their creators, but I did find the names of some dealers and organizations, including:

The Cosmopolitan Club: A private all-women’s social club on the Upper East Side

(https://en.wikipedia.org/wiki/Cosmopolitan_Club_(New_York))

Ferargil Galleries: A commercial art gallery, run from 1915-1955 by Fredric Newlin Price. They dealt mostly American art (http://www.aaa.si.edu/collections/ferargil-galleries-records-8905)

The Whitney Studio Club: the antecedent to the Whitney Museum (http://cdm16694.contentdm.oclc.org/cdm/landingpage/collection/p15405coll1)

Roman Bronze Works: A bronze factory in Corona, Queens that was the country’s leading art foundry during the American Renaissance. (https://en.wikipedia.org/wiki/Roman_Bronze_Works)

Marnie Sterner Fine Arts (http://www.aaa.si.edu/collections/marie-sterner-and-marie-sterner-gallery-papers-9479)

Daniel Chester French: An artist with work in the founding collection, known for his Lincoln Memorial statue. I was interested to read in the notes field for one of his pieces in TMS that his studios were located on McDougal Alley, fairly close to the current location of the Whitney.

https://en.wikipedia.org/wiki/Daniel_Chester_French
http://www.metmuseum.org/toah/hd/fren/hd_fren.htm

C.W. Kraushaar Art Galleries: Still in existence today, was founded in 1885. Kraushaar Galleries sold many early Whitney collection items, including Charles Demuth’s “My Egypt”

(https://en.wikipedia.org/wiki/Kraushaar_Galleries)

(http://www.kraushaargalleries.com/history/)

(http://gildedage2.omeka.net/)

Valentine Gallery: 1924-1948, founded by F. Valentine Dudensing.

(http://www.aaa.si.edu/collections/valentine-gallery-records-7103)

Wildenstein Galleries: Still in operation today.

Frank K.M. Rehn Galleries: 1918-1981. Lots of correspondence between Rehn and Juliana Force, as well as with the Whitney Studio Club, is available online.

(http://www.aaa.si.edu/collections/frank-km-rehn-galleries-records-9193)

(http://www.aaa.si.edu/collections/container/viewer/Whitney-Museum-of-American-Art-includes-Friends-of-the-Whitney-Museum-of-American-Art–203618)

(http://www.aaa.si.edu/collections/container/viewer/Whitney-Studio-Club-Wills–203619)

N.E. Montross: (https://gildedage.omeka.net/exhibits/show/galleriesandclubs/galleries/montross)

Max Kuehne: In addition to Kuehne’s own work, the founding collection also contains work by other artists donated by Kuehne.

The Downtown Gallery: Run by Edith Halpert

(https://en.wikipedia.org/wiki/Edith_Halpert)

Macbeth Gallery: 1892-1953

(http://www.aaa.si.edu/collections/macbeth-gallery-records-9703)

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

September 2, 2016

by amanda

Read Time:2 Minute, 7 Second

After my second week of classes and listening to Professor Pattuelli lecture on Linked Open Data, I have somewhat of a better sense of how the RDF triples used with LOD work, and how to use them to express relationships. Our discussion of LIDO and CIDOC-CRM in Art Documentation has made me start thinking of how to map CIDOC ontology onto this triple structure in working with the Whitney’s data. Joshua used Schema.org vocabulary to build relationships in his dataset; I wonder whether the CIDOC-CRM ontology might offer a richer set of relationships terms, and whether it might make the Whitney’s data more interoperable with that of other cultural institutions. At the same time, the broad scope of the Schema.org ontology might make the Whitney’s data more accessible outside of the museum world.

The “student of”,”teacher of”, and “fellow student of” properties in ULAN seem like one potential avenue to explore relationships between artists in the Whitney’s founding collection. Robert Henri, for instance, was a teacher of many Ashcan School artists and an influential figure in the movement. A visualization of the Whitney’s collection data could potentially be created focused around central figures like Henri.

I also wonder whether it might be possible to collect data from a source like Grove Art Online to build relationships. I was just introduced to Grove Art Online in the Art Librarianship class at Pratt, and noticed that it is one of the reference sources used by the Art and Architecture Thesaurus. Grove Art has relationship data on what movements artists are associated with, their patrons and collectors, the materials they used, and people they collaborated with. Grove Art is subscription-based and not openly accessible, however, so I don’t know if it’s acceptable as a source of data.

As well as thinking about ontologies and relationship data, I also looked at how the British Museum has chosen to present its linked data online. In addition to its standard online collection site, the British Museum has its data published in a computer-readable format, organized using CIDOC CRM. Users can access collection data in a variety of RDM resolvable formats, in addition to being able to access an interface for searching the collection using SPARQL queries (http://collection.britishmuseum.org/). The British Museum’s Semantic Web Collection Online may be a good model for publishing the datasets Joshua created this past year, as well as any future data collected during the course of this project.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

September 9, 2016

by amanda

Read Time:4 Minute, 31 Second

I’m curious to what extent object metadata can be pulled from TMS. I found these applications, though given that the Whitney already has an online collection, it may be redundant

http://binder.readthedocs.io/en/latest/user-manual/overview/intro.html

https://github.com/smoore4moma/tms-api

I’m interested in generating csv file(s) from TMS with provenance info for objects in the Founding Collection, similar to how Joshua created birth info and death info files for Founding Collection artists. I’ve started exploring what kind of reports TMS can generate, and how best to combine non-artist constituent data from these reports into a single database.

In TMS, I created four object packets sorted by credit line using Joshua’s object packet for the Founding Collection. I first searched TMS for objects with acquisition dates of 1948 or earlier to make sure Joshua’s Founding Collection packet is complete. I then broke down his packet by whether the object was a gift, purchase, or exchange, or if the record is missing credit.

I only found eight objects in the Founding Collection missing credit lines. These objects have provenance information recorded in the Constituent or Provenance field, but not the Credit Line field:

n.d. collection of the artist; -1931 collection of Gertrude Vanderbilt Whitney, New York, New York; 1931 Whitney Museum of American Art, New York (gift of Gertrude Vanderbilt Whitney)

31.229	1925- collection of the artist; 1931 Whitney Museum of American Art, New York, New York
31.321	1927- collection of the artist; 1931 Whitney Museum of American Art, New York
31.324	1930- collection of the artist; 1931 Whitney Museum of American Art, New York
31.335	1930 collection of the artist; 1931 Whitney Museum of American Art, New York
31.370	1930- collection of the artist; 1931 Whitney Museum of American Art, New York
31.378
31.380	-1929 collection of the artist; 1929-1931 collection of Gertrude Vanderbilt Whitney, New York, New York; 1931 Whitney Museum of American Art, New York (gift of Gertrude Vanderbilt Whitney)
31.964	n.d. collection of the artist; 1930-1931 collection of Gertrude Vanderbilt Whitney, New York, New York (sold through Frank K. M. Rehn, Inc., New York, New York); 1931 Whitney Museum of American Art, New York (gift of Gertrude Vanderbilt Whitney)

In looking at objects in the Founding Collection noted in the Credit Line field as being purchased, I’m noticing a significant number are noted in the Constituents field as being sourced/purchased directly from the artist. Since constituents in TMS may have multiple roles (both object-related and acquisition-related), separating galleries/dealers out from artists may be complicated. Checking acquisition-related constituents in the ‘purchase’ package against object-related constituents, possibly using a script or an application like OpenRefine, may be a way to find duplicate constituents. Joshua already created a cleaned-up artist data file, which could be used as a basis for separating out duel object and acquisition-related artists.

Somehow pulling names from TMS’ free-text fields like Provenance and Notes may be another way to find acquisition-related constituents. I know Prof. Matt Miller created an analyzer tool for the Linked Jazz project at Pratt meant to extract names from oral history transcripts (https://github.com/thisismattmiller/linked-jazz-prototype-transcript). I’m not sure how well it works or whether it’s ever been used outside the context of Linked Jazz, but it might be worth exploring. I could also manually look through the text fields for names. There are only 1,040 purchased works in the Founding Collection, so it might not be prohibitively time-consuming to do so.

I generated a ‘Text Entries’ report for the ‘purchased’ package I made in TMS to get a better overview of what kind of information is in TMS’ notes fields. I haven’t found that much new information on dealers, though I have found some interesting notes on other entities, like the name of people depicting in portraits, as well as names of friends, family members, and collaborators of various artists. I wonder whether collecting data on these people would be worthwhile, or whether it would be better to focus strictly on acquisitions. Additionally, I’ve seen some place names, such the location of scenes depicted in landscape paintings.

One issue I’ve noticed in the “Artist Biography-Online Publication” open text field is that these biographies occasionally refer to the wrong artist. I’m not sure if this is an issue with TMS, or an issue of information being generated from the wrong online sources?

I found a Wikipedia page for Frank Knox Morton Rehn, a painter and the president of the Salmagundi Club (https://en.wikipedia.org/wiki/Frank_Knox_Morton_Rehn). Frank K. M. Rehn is noted as the acquisition source for 18 works in the Founding Collection, and VIAF notes F. K. M. Rehn as an alternate name for Frank Knox Morton Rehn (https://viaf.org/viaf/21155834/), but apparently Frank K. M. Rehn the gallerist was Knox Morton’s son (http://www.aaa.si.edu/collections/frank-km-rehn-galleries-records-9193/more).

I’m wondering if the Subject Terms field in TMS might be an alternative to exploring provenance. It might be interesting to explore possible subject trends in art in the Founding Collection; there seems to be a trend toward figurative work, for instance.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

September 16, 2016

by amanda

Read Time:50 Second

I’ve started a spreadsheet to record acquisition-related constituents for purchased objects in the Founding Collection. I’m noting the constituent’s name, name authority record, objects they are the constituents of, their relation to the object (artist, dealer, etc), links to sites with more information, and possible connections to explore:

https://docs.google.com/spreadsheets/d/1j7QAm3GGaIfDTKSVvVaU9_tGgg6v0f-jMoefZJ9jSdU/edit?usp=sharing

One connection I’ve noticed among a few of the artist constituents is activity in Woodstock and membership in the Woodstock Artists Association. I went up to Woodstock earlier this year to visit the Estate of Philip Guston and to interview their archivist, Emily Jones. Emily is also the archivist for the Woodstock Artists Association, and gave me a lot of historical background on Woodstock and its importance as an artists’ community, something I was only vaguely aware of beforehand. It seems like a fair number of WAA artists are represented in the Founding Collection, so this may be one area to explore further.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

September 23, 2016

by amanda

Read Time:1 Minute, 15 Second

DBpedia and ULAN both seem to have richer sets of relationship properties than Wikidata, so they might be more useful resources to query. Robert Henri’s DBpedia page, for instance, has “movement”, “training”, “influenced”, “influenced by”, and “seeAlso” properties. Again, however, pulling data from DBpedia may be beyond my technical capacities at the moment.

The first step for applying this process to Provenance would be to export the “imoec_founding_collection_purchase” package in TMS as a csv file. I haven’t been able to figure out how to generate csv files from TMS yet, but this would be a starting point. I’m just learning to search fields and export JSON files from csv files, so I should at least be able to start the process of cleaning a Provenance csv file.

Python doesn’t seem to be currently installed on my workspace computer, and I keep getting denied permission to install it on my user account. I’m not sure whether it would be better to contact Whitney IT or simply to use my laptop for Python programming. I’m more familiar with working with a Mac environment, so working on my laptop might be easier.

Of note: the Carnegie Museum of Art was recently involved with a data-related project focused on provenance. It lead to the creation of a Ruby library for generating provenance records:

http://www.museumprovenance.org/

https://github.com/arttracks/museum_provenance

http://hyperallergic.com/234563/carnegie-museum-of-art-makes-its-provenance-accessible-and-interactive

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

September 30, 2016

by amanda

Read Time:2 Minute, 56 Second

I’m kind of curious how the tables that make up the backend of the TMS database are organized. I’m taking Database Design this semester, and am eventually going to start working with MySQL in more depth. I’m wondering whether it be worthwhile to make a MySQL database that could store provenance information in a more normalized form than TMS. I know Joshua already made a MySQL database for outputting URIs, so maybe I could augment his database with Provenance info.

I downloaded and looked at Joshua’s various object and artist csv files, as well as his “lod_test.sql” file. I’m not that familiar with SQL at this point, but I can kind of see how his MySQL database is set up with an Objects and People table. At this point the two tables aren’t connected to one another. It seems like it would make sense to have URIs created from the Objects table relate to URIs created by the People table at some point, however. Object resource URIs from the British Museum’s collection, for example, contain links to Person/Institution URIs relating to the object, such as “has former or current owner”. I’m not sure how to do this from a technical standpoint yet, however. To indicate something like provenance, it seems like it would make sense to try to enter Acquisition-related constitutions into the current People table, or to create separate Object Related and Acquisition-related Constituent tables, and to somehow connect the Object and People tables when outputting PHP files. I guess just putting Acquisition-related constituents into a tabular format would be a start. Additionally, since Acquisition-related constituents may be institutions like galleries, maybe it would make sense to create an Institutions table, or change People to Person-Institution?

I read about D2RQ (http://d2rq.org/), a tool for converting data from relational databases into an RDF format, and allowing it to be accessed through SPARQL queries. It seems like something beyond my current technical abilities to use, but I wonder whether it could eventually be used to convert a database built up from Joshua’s current one into more interconnected URIs. Additionally, if the Whitney eventually hosts Joshua’s URIs online, this would maybe assist with providing a SPARQL endpoint to query them.

Using MySQL, maybe I’ll try to create an ER diagram of how the Objects and People tables could be connected, using the British Museum collection as a model. My initial thought would be to create tables representing the predicates in RDF triples. Not sure if it would be possible to convert data from this kind of relational model into linked data, however. Linked data is meant to overcome the hurdles of storing data in relational databases, so maybe using mySQL for this kind of data is counterintuitive? Using MySQL as a triple store is apparently not unheard of, however (http://rdfextras.readthedocs.io/en/latest/store/mysqlpg.html).

Alternately, maybe NoSQL makes more sense as a database system for the project, since it is more commonly used for triple stores. The British Museum seems to use GraphDB to store their triples, which is available as a free download (http://ontotext.com/products/graphdb/). GraphDB can import data stored as .ttl, .rdf, .rj, .n3, .nt, .nq, .trig, .brf, and .owl files. Maybe it would make more sense to apply a script to the Person and Object csv files to generate triples in a format that can be fed into GraphDB?

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

October 7, 2016

by amanda

Read Time:1 Minute, 37 Second

GraphDB

I received some overly-aggressive sales emails from GraphDB after downloading a free version of the app which have kind of put me off from using them. Basically, when I downloaded the free version of the desktop app, someone in the sales department at GraphDB (which is apparently based in Bulgaria) used my name and registration email to look me up on LinkedIn and discovered I was working at the Whitney. This sale rep then emailed me suggesting a phone call with someone else on the GraphDB team, presumably so they could get me to convince the museum to buy a subscription. I basically told the guy I was in the very preliminary stages of my project, and that I had no budget for my project or say in the museum’s software/database system purchases, but that I would keep them in mind for the future. I assume these kinds of emails are typical with museum vendors, but having no personal experience with them, I was really taken aback.

Ontologies

http://erlangen-crm.org/ – CIDOC-CRM ontology mapped onto OWL; used at British Museum

Possible predicate properties for indicating provenance:

CIDOC-CRM doesn’t seem to have a lot of terms related to acquisition, so maybe it would make sense to stick to Joshua’s use of schema.org?

Person/Organization

https://schema.org/seller

https://schema.org/acquiredFrom

https://schema.org/owns

https://schema.org/funder

https://schema.org/provider

https://schema.org/sourceOrganization

https://schema.org/DonateAction

Next Steps

Next week – work on joining the object and acquisition-related constituents files in Python.

Join on Object ID/Constituent ID, presumably

Also work on MySQL stuff

Do acquisition-related constituents go into the Person table, or do Object-Related and Acquisition-Related Constituents get separated?

How to make more interesting:

Try to map onto Whitney Studio Club materials on DPLA: Would be cool to connect objects to paperwork related to their purchase.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

October 14, 2016

by amanda

Read Time:2 Minute, 42 Second

Triple Stores

Given that MySQL is not well-suited as a triple store, I think it would make sense to migrate and append Joshua’s data in a non-relational, triple store format. The Linked Jazz Project at Pratt uses Apache Marmotta for their triples, and I’m thinking this would be good as a solution for the Whitney as well. Prof. Pattuelli has said there’s space for me to experiment on the Linked Jazz Marmotta server. The opendata.whitney.org server is another option; additionally, data from one server could always be migrated to another eventually.

Is Marmotta too complicated, though? What is its user interface like? Will discuss at meeting w/everyone on Thursday.

First Steps?

First need to run script that will check gift and donation csv files against artist.csv and remove dupes.

OpenRefine

I’m going to start playing around in OpenRefine to see if I can combine some csv files, or at least get rid of duplicate entries within files:

Enriching Provenance

What info does the NYTimes have on Provenance-related constituents? I’m checking the API for anything interesting. Maybe more names (related entities) could be pulled from the NYT’s ‘name.value element:’

Another project idea would be to query DPLA for digital archival purchase records from the Whitney Studio Club available on DCMNY. It’s kind of cool to see how much these pieces sold for. Poppies by Ernest Fiene, for instance, went for for $100:

Not really sure if this would yield great visualizations – could maybe pull images? Also, maybe this would be redundant if TMS object records are already connected to objects in WhitneyCat (although this doesn’t seem to be the case?). I am interested in the issue of how to connect digital archival materials with collection objects they relate to. Maybe adding links to the URLs of the archival documents on DCMNY to the object/person URIs? It is interesting how much of the paperwork for these early purchases/donations is online. The descriptions of some of these archival objects also give some interesting context to the purchases.

Third project idea: map the galleries that contributed to the Founding Collection that are still around, using NY.gov data: https://data.cityofnewyork.us/Recreation/New-York-City-Art-Galleries/tgyc-r5jh

Not many of them are still around though…so maybe not.

Querying Wikidata: Not sure if it could be the basis of a separate project. Their “instance of: art gallery” property is interesting:

https://www.wikidata.org/wiki/Q7990321

Another idea – link constituents to Archives of American Art records, since they seem to have the richest info on early 20th century galleries and have a lot of digitized collections. They don’t have an API, though, so I’m not sure how easy it would be to extract their content. Additionally, the images on their website are loaded through a viewer (rather than being downloadable JPEGs:
http://www.aaa.si.edu/collections/aca-galleries-records-8772/more#section_1

Archives of American Art is sometimes the only place I can find info on these galleries. I’ve found some with ULAN and/or VIAF records, but that’s the extent of the content I can pick out.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

October 20, 2016

by amanda

Read Time:54 Second

Discussion w/Profs Pattuelli and Miller

What ontologies to use for triples?
Research appropriate ontologies using Linked Open Vocabularies (https://lov.okfn.org/)
Prof. Pattuelli is partial to CIDOC because it is the industry standard.
Prof. Miller argues that the chosen ontology is not that important if the end goal is just something like providing a SPARQL endpoint.
How this linked data will be used shapes ontology choices.
Schema.org may be better for describing Provenance than CIDOC
Dublin Core is another suggestion.
Prof. Miller will ask colleagues at NYPL how they handle provenance indication.

How to handle database (whether or not to migrate to triple store)

Setting up a triple store is potentially complicated
JSON-LD files need to be fed in
If MySQL data is being used to generate JSON-LD, may be better to just work on appending Joshua’s existing database for now and feed triples in triple store later
Could also try to step up triple store locally to experiment. I tried setting up Marmotta this past week, but have thus far been unsuccessful.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

October 21, 2016

by amanda

Read Time:1 Minute, 48 Second

Ontologies

Carnegie Museum

Prof. Miller suggested looked at the Carnegie Museum’s modeling of provenance:
- http://www.museumprovenance.org/reference/standard/
- My summary of the article
I had previously looked at the visualization project they did related to this (http://hyperallergic.com/234563/carnegie-museum-of-art-makes-its-provenance-accessible-and-interactive/). Not sure if a similar idea would work for the Whitney, since the Founding Collection objects largely did not travel far from where they were made.
Although CIDOC is more complicated to understand than the schema.org ontology, I am thinking it might make more sense for the Whitney, given that it is both the museum world standard, and that it has an event-based structure.

Modeling

Prof. Pattuelli suggests diagramming attributes using Gliffy to allow for deeper analysis of whatever data model ends up being used, as well to provide a visual for how everything is set up. She provided me with the login info for Linked Jazz’s account.

Visualizing Provenance

I’m thinking of doing a geotagging/geography related project based on some of the Linked Jazz data for Prof. Miller’s class, specifically constituent location data beyond place of birth/death. I would then apply similar methodology to Whitney acquisition-related constituents if successful.
The Photographers’ Identities Catalog (PIC) project at NYPL is a good resource for looking at possible data sources to query for non-birth/death places, although many of these are not in linked data form.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

October 25, 2016

by amanda

Read Time:15 Second

As of MySQL Version 5.7.8, MySQL can be used to generate JSON. If these JSON files could be put into a triple store, the generation of PHP files in addition to JSON files may not be necessary:

http://dev.mysql.com/doc/refman/5.7/en/json-functions.html

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

October 28, 2016

by amanda

Read Time:1 Minute, 22 Second

I started a fork of the Whitney’s opendata Github repository for my project files. Not sure if forking or branching is a better approach for keeping my contributions to the project, but I can always change the structure later:
https://github.com/MollieEcheverria/opendata

I’m currently trying to re-model how the Whitney’s linked data is structured. Joshua’s structure uses Schema.org, but I would like to attempt to model the existing data onto CIDOC if possible, in keeping with what the British Museum has done with their linked data, as well as the Carnegie Museum’s model.

I’m starting by trying to create a data model using Draw.io. This is proving a little confusing thus far: https://drive.google.com/file/d/0B2gZKtQxkfUhX19FdXlDUEp2b2c/view?usp=sharing

Additionally, I am trying to convert a sample British Museum object record (http://collection.britishmuseum.org/id/object/EOC3130) from JSON to CSV using Python, in order to try to reverse engineer converting CSV field data to CIDOC-structured JSON. My script attempts(s) are in my Github repository.

Database-wise, since SQL can be used to output JSON, I’m planning to keep using and appending Joshua’s MySQL database for now. If and when a triple-store is implemented at the Whitney, JSON-LD/.nt files would need to be fed into it, so the MySQL database would serve as the source for generating these files. This seems to be how Linked Jazz at Pratt generated their triples as well (or at least how they stored the names from the transcripts they analyzed : https://linkedjazz.org/data-productionworkflow-draft/)

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

November 4, 2016

by amanda

Read Time:1 Minute, 51 Second

Project Overview for Rest of Semester and Spring

I am aiming to have a data model for the Whitney ready and approved by the first week of December.

Once the data model is established, I will query for name authority records for Provenance-related constituents.

As far as where to find authority files for obscure early 20th century dealers and collectors, as I found researching back in September, VIAF seems to have the most consistent results.

Worldcat’s experimental Linked Data also has some great relationship content for records related to some of these constituents, though the type of the relationship is not necessarily well-defined: http://www.worldcat.org/title/ferargil-galleries-records-circa-1900-1963/oclc/888072546

Getting data out of TMS’ free-text Provenance field:

Can use Python string matching to look for dates.

Each provenance note in TMS is separated by a semicolon, making it easy to use Python to sort these notes into distinct events.

The Carnegie Museum just puts their TMS Provenance info into a CIDOC “P3_has_note” field, but it seems like they could easily extract more useful info from these.

I could use either Python or MySQL to create updated JSON-LD URI files. I’m not sure which method is easier at this point, but will have a better sense after doing more work in my Programming for Cultural Heritage/Database Design classes. MySQL might make sense since SPARQL is SQL-like?

All of these JSON-LD files can then be combined into an N-Triple file (the format used for Linked Jazz’s data), and put into a triple store on the Whitney’s server (which would solve the PHP redirection issue Joshua was having).

After everything is online, if there is sufficient time left in the Spring semester, I would use the Whitney’s URIs as the basis of some kind of visualization project to demonstrate how they can be used by the public.

Specifically, it might be good to connect the Whitney’s linked data with another museum’s, the Smithsonian Renwick Gallery being the obvious example.

The Smithsonian’s artist URIs (http://edan.si.edu/saam/id/person-institution/176) could even be linked to in ours.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

November 10, 2016

by amanda

Read Time:24 Second

I created a preliminary data model for the Whitney’s data using the Smithsonian and British museum’s URIs and the Carnegie Museum’s provenance model as references. I also looked at a mapping of CIDOC to LIDO done by researchers at FORTH-ICS in 2010 (http://www.cidoc-crm.org/Resources/the-lido-model), and compared it to the mapping of the Whitney’s TMS data to LIDO detailed in the Whitney Content Standard Element Sets.

Click to View Model

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

November 17, 2016

by amanda

Read Time:1 Minute, 18 Second

Issues With CIDOC Mapping – CIDOC’s Lack of Namespace URIs

One issue I’ve noticed in trying to map data to CIDOC is that CIDOC doesn’t seem to provide individual namespace URIs for its different classes and properties. Instead, these terms are all stored in a single namespace file: (http://www.cidoc-crm.org/sites/default/files/cidoc_crm_v5.0.4_official_release.rdfs.xml)

Because of this issue, the Smithsonian’s links to CIDOC-CRM term resources come up dead (http://edan.si.edu/saam/id/object/1997.70/acquisition), while the British Museum uses an OWL-based CIDOC mapping that just downloads the entire ontology as a single RDF namespace file rather than linking to individual term URIs. This poses a problem, as creating triples that can’t be properly referenced to a URI would violate basic linked data principles.

One solution to this issue would be to map terms from terms from one or more external vocabularies onto the CIDOC structure. Prof. Pattuelli suggested Linked Open Vocabularies as a resource to research some of these: https://lov.okfn.org/dataset/lov/

It looks like there was an attempt to map CIDOC to Dublin Core back in 2000 (http://www.cidoc-crm.org/sites/default/files/dc_to_crm_mapping.pdf), but that mapping doesn’t seem to address CIDOC Event entities at all. Dublin Core does have an Event class, however (http://purl.org/dc/dcmitype/Event), so it seems like it wouldn’t be impossible to map Dublin Core terms onto CIDOC’s event-based structure.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

November 18, 2016

by amanda

Read Time:3 Minute, 22 Second

Linked Data in a Relational Database

Prof. Miller gave my Program for Cultural Heritage class read-only access to the NYPL’s archives database. Looking at the way this data is structured, particularly access terms for constituents, is helpful in thinking about tabular structure for the Whitney’s linked data.

There are three tables in the NYPL Archives schema, `collections`, `access_terms`, and `access_term_associations`

The `access_term_associations` table is used to link the individual archival collections, store in the `collections` tables, to names of constituents, which are stored in the `access_terms` table:

This structure allows the NYPL to generate some interesting interactive interfaces: http://archives.nypl.org/tools

An archival collection from this database represented in JSON:

{

“id” : 1,

“title” : “Thomas Addis Emmet collection”,

“origination” : “Emmet, Thomas Addis,\n 1828-1919”,

“org_unit_id” : 1,

“date_statement” : “1483-1876 [bulk 1700-1800]”,

“extent_statement” : “30.83 linear feet; 108 boxes, 21 volumes”,

“linear_feet” : 30.83,

“keydate” : 1483,

“identifier_value” : “927”,

“identifier_type” : “local_mss”,

“bnumber” : null,

“call_number” : “MssCol 927”,

“pdf_finding_aid” : “”,

“max_depth” : 3,

“series_count” : 28,

“active” : 1,

“created_at” : “2013-01-08 20:52:54”,

“updated_at” : “2015-11-05 03:03:37”,

“boost_queries” : “[\”emmet\”]”,

“date_processed” : null,

“component_layout_id” : 2,

“has_digital” : 1,

“featured_seq” : null,

“fully_digitized” : 1,

“show_generated_pdf” : 0,

“status_note” : null

These archival materials don’t have URIs associated with them directly, presumably since they are organized at collection level.

A constituent or concept related to this collection, including a link to a name authority record:

{

“id” : 3004,

“term_original” : “Legislators–United States”,

“term_authorized” : null,

“term_type” : “topic”,

“authority” : “lcsh”,

“authority_record_id” : null,

“value_uri” : “http://id.loc.gov/authorities/subjects/sh85075851”,

“control_source” : null,

“created_at” : “2013-01-08 21:01:45”,

“updated_at” : “2013-01-08 21:01:45”

The access terms association that shows the relationship between the collection and concepts/people related to it.

{

“id” : 7246,

“describable_id” : 1,

“describable_type” : “Collection”,

“access_term_id” : 3004,

“role” : null,

“controlaccess” : 1,

“name_subject” : 0,

“created_at” : “2013-01-08 21:01:45”,

“updated_at” : “2013-01-08 21:01:45”,

“function” : null,

“questionable” : 0

Because there may be many terms associated with each collection, and since any given term may apply to multiple collections, the access terms association table exists to represent this many-to-many relationship.

For the Whitney’s data, event table(s) could serve a similar role as the access terms association table for the NYPL, connecting constituents and objects as well as places and thesaurus terms.

While non-relational databases are the standard for storing linked data, at the same time, since linked data is all about relationships, it would seem reasonable that it could be used to represent linked data relationships as well. There is still the issue of hosting, however, which is presumably where the use of a triple store would be needed.

This site (https://sites.tufts.edu/liam/) gives a great overview of some methodologies of implementing linked data in archival settings, and also argues in favor of using a relational database as the basis for generating triples.

The article mentions D2RQ, a tool for hosting relational data I explored briefly and unsuccessfully tried to install earlier in the semester. I’m not sure if I would have the technical ability to install it, but if IT at the Whitney could implement it like they did Joshua’s PHP server, I could develop a MySQL database and host it on this server, solving the triple store issue and having a SPARQL endpoint available as well.

The article also mentions a project called ReLoad, which seems to involve URIs and a SPARQL endpoint generated by/stored in xDAMS. The URI links for this project seem to be dead, however, and I’m somewhat unclear about how everything within the project is structured.

Experimenting with Databases

I ended up creating a test MySQL database to input. I uploaded a SQL file for this database to GitHub.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

November 22, 2016

by amanda

Read Time:15 Second

https://neo4j.com/

https://app.graphenedb.com/dbs

http://www.linkeddatatools.com/introducing-rdf

http://blog.datagraph.org/2010/04/rdf-nosql-diff

https://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSIndex

http://wifo5-03.informatik.uni-mannheim.de/bizer/d2r-server/publishing/

https://sourceforge.net/projects/trdf/files/?source=navbar

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

December 2, 2016

by amanda

Read Time:31 Second

Berenson Drawings of the Florentine Painters Project

Hearing Alex Provo speak in our Art Documentation class on the Drawings of the Florentine Painters linked data project has given me a lot of inspiration both for the handling of the Whitney ontology and for how to normalize and store linked data triples. I am anxious to see how the official launch of this project goes in February, as this project would be an excellent model for how to handle Whitney object data. It would also seem logical to try to incorporate this model into things like online catalogue raisonnés and museum catalogues as well.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

December 12, 2016

by amanda

Read Time:6 Minute, 25 Second

Drawings of the Florentine Painters Project and Whitney Model

After reading more on Drawings of the Florentine Painters, I think the project offers a very concise model for how to proceed with the Whitney’s linked data in the spring.

While I was originally thinking to just use CIDOC entities and properties plus Art and Architecture Thesaurus controlled terms, I can see now that incorporating ULAN/VIAF authority files into the Whitney model would be beneficial, particularly given that CIDOC only has a namespace page for its properties rather than URIs.

After querying Wikidata for my project in Programming for Cultural Heritage, I don’t think it would be a good name authority source for the Whitney. During my early exploration of

Acquisition-related names related to the Whitney Founding, I found that even some of the most obscure early 20th century art dealers had VIAF or LCSH files, whereas I doubt Wikidata would have much information on these constituents.

Geonames would also be a good source for place names.

Rough Outline of Work for Spring

My work in Spring will be roughly structured around how Alex Provo et al. have described their process in their soon-to-be-published write-up on the Drawings of the Florentine Painters project.

Data Preparation: Now – Jan 21st, 2017

I first need to combine all of my and Joshua’s data into three main CSV files, constituents, objects, and events. This can be done with Google Sheets.

I might also need to query TMS for some additional constituent data.

Data Processing: Jan 21st – Feb 4th, 2017

I would then use OpenRefine and possibly Python to handle any discrepancies in the data (splitting, cleaning, merging, etc).

I could either keep this data in CSV files or import it into a MySQL database. A relational database might be easier to manage and append, and might also be more useful for representing relationships, but could take time to build.

Data Modeling Refinement/Extension: Feb 4th – 11th, 2017

I might refine the Whitney model based on the data at this point. The Florentine Renaissance Painters project experimented with incorporating equivalent properties from other ontologies like Dublin Core along with CIDOC, although this idea was ultimately scrapped. I’m taking Metadata: Description and Access in the Spring, so I may have a better sense of whether to include other schemas and what other schemas to incorporate by February.

Defining the Conceptual Model in 3M: Feb 11th – March 4th, 2017

The Florentine Renaissance Painters project used two applications to map its tabular data onto CIDOC: Mapping Memory Manager (3M) and Karma

3M (http://139.91.183.3/3M/FirstPage):

3M (which unfortunately has kind of a buggy website), is a web-based tool for managing mapping definition files.

This document (http://83.212.168.219/DariahCrete/sites/default/files/mapping_manual_version_4g.pdf) has more details on how to use 3M.

The aforementioned PDF also contains mentions this hierarchical representation of CIDOC in Standford’s Web Protegé, which is a useful representation of CIDOC classes ranked from general to specific: http://webprotege.stanford.edu/#Edit:projectId=6fe69ce8-94b9-4624-bfe6-43af7c6d0fe3

3M is built specifically for CIDOC mappings. As my model for the Whitney is based on CIDOC, I can use the site to map the Whitney’s linked data.

3M has over 500 different mappings hosted at present, primarily CIDOC-based. These include mappings of LIDO and Dublin Core onto CIDOC, models based on internal relational database content, and even what looks like someone’s attempt to map the TMS eMuseums module:

Schemas are uploaded as XML. You can also export other people’s data mappings, see detailed comparisons of different mappings of the same source schema, and view detailed analysis of your schema:

To plug in the Whitney’s data, I would first need a source schema in XML form. This would be sourced from the TMS fields in whatever tabular data I have. I’m still a little unclear on how to do this; Alexandra apparently used a Python script to convert the column names in her tabular data to XML. 3M has an XML schema called X3ML that it provides as template for source schemas, so I could also probably just manually plug fields into this template. The documentation for 3M also explains how to use joins to convert relational database source data to a usable XML source schema.

3M also requires a URI policy generator XML file, which it uses to create URIs associated with whatever domain the source schema is hosted on (/whitney/collection/object/23928, etc).

Once these two files are uploaded to 3M, I can use the site’s interface to associate each TMS field with a CIDOC class or property. 3M can also validate your mapping and suggest other class and entity mappings.

The final mapping (or Target Record) can then be exported from 3M as RDF/XML, N-Triples, or Turtle.

Mapping Whitney TMS Data to Classes and Properties: March 4th – 25th, 2017

Karma (http://usc-isi-i2.github.io/karma/)

Karma is a data integration tool that can be used for database data, spreadsheets, XML, and JSON. Karma automates the process of adding URIs to this data and mapping it to an ontology.

This document (http://www.isi.edu/~szekely/contents/papers/2013/eswc-2013-saam.pdf) details its use in greater detail.

I would start by preparing the Whitney’s tabular data for import into Karma, either directly from a CSV file, or from a SQL database if I choose to use one. Depending on what storage format I choose, I would either concatenate columns in MySQL or use OpenRefine.

Karma is a desktop app that runs something like MySQL Workbench. You work with and manage data locally, but this can later be hosted on an external server.

Karma can also run using data from a hosted SQL database. If I choose to store my initial data in a SQL database, I could opt to host this on the internal Whitney server Joshua used this past year and access Karma through there.

I would then use Karma to map the various columns of data onto classes from the CIDOC mapping I refined in 3M.

Data Enrichment: March 25th – April 15th, 2017

Karma also integrates the ability to link data to external resources.

VIAF/LCSH/ULAN: For Acquisition-related constituents. Joshua already used ULAN for Object-related constituents, but it might be worth enriching his data with VIAF/LCSH data as well.

Geonames: For places related to events/constituents

DBpedia/Wikidata: I don’t know how many of the Whitney’s non-artist constituents (in particular Acquisition-related constituents) would have records on these sites, but they might be worth investigating if time permits.

Data from Other Museums (The Smithsonian, British Museum, etc): This actually might be the most important enrichment data to include. One of the main goals of implementing linked open data at the Whitney would be to enable the sharing of resources with other art-related cultural institutions, so this integration is key.

Data Publishing: April 15th – 29th, 2017

Once all the data is prepared, it can then be hosted in some kind of graph database.

Drawings of the Florentine Painters is using Metaphacts (http://www.metaphacts.com/), which seems like an attractive solution.

This would either be hosted on the Whitney server, or possibly on the server of whatever company I use.

Visualization(s) and Incorporating Images: April 29th – May 13th, 2017

Time permitting, I will create some kind of visualization project with the Whitney’s data using Gephi/Tableau.

I could also try to incorporate image content from TMS/the main Whitney Collection page in some way.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

December 16, 2016

by amanda

Read Time:15 Second

Access server – Access granted – Linux

3M – tricky. Maybe not necessary

Alex – can come in to look at project

Look at what is on the server – is there stuff on there Josh added that is not mirrored elsewhere

Look at the namespaces now/early

Maybe just Python would be easier 3M/Karma

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

December 30, 2016

by amanda

Read Time:1 Minute, 18 Second

Setting Up Server Access

I was unfortunately unable to set up access to Joshua’s Whitney database server before the break. I have an appointment to troubleshoot the issue with Alison on Thursday, January 5th at 3.

VIAF

My first attempt at gathering name authorities for the founding collection will involve querying VIAF, as they seemed to have the most consistent records for the somewhat obscure Acquisition-related constituents during my initial search.

VIAF has a pretty standard API (https://platform.worldcat.org/api-explorer/apis/VIAF), which seems like it should be straightforward to query. Since I don’t have any IDs from other authorities to start with, I would use the Authority Cluster Auto Suggest method.

An initial Authority Cluster Auto Suggest search for Briggs Buchanan, a constituent from the Gift list, yields this record (http://viaf.org/viaf/64028666/) which seems to be for the right person (died in 1976, seems to have been an art history scholar).

Querying With Python

I created a Python script to query VIAF using the Auto Suggest method, which seems to have been a success.

Both VIAF and LC URIs are now saved in CSV files for the Gift and Purchase-related constituents.

These URIs seem for the most part to be accurate, although a few point to the wrong URI (Edward Root to his foundation instead of his personal name, Thomas Donnelly to the wrong Thomas Donnelley)

VIAF URIs sometimes contain links to the person’s corresponding URI on Wikidata and ULAN, so querying these sites may be a next step.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

January 5, 2017

by amanda

Read Time:2 Minute, 16 Second

Incorporating External URIs Into the Whitney’s Linked Data

After my success with querying VIAF last week, I’d like to explore incorporating person/institution URIs from other institutions into the Whitney’s constituent URIs as well.

The Florentine Renaissance Drawings project plans to incorporate linked data and images from the British Museum into its URIs. Linking the Whitney’s data to external sources could have similar enrichment value.

The Smithsonian American Art Museum seems like it would be the most obvious source for enrichment. The SAAM has internal URIs for all of the artists in its collection, many of which include bios and images.

The Smithsonian Archives of American Art also seem like it could provide some enrichment value. The SAAA’s website looks to have been redesigned and relaunched within the last month or so, and the site has persistent URIs for many archival items related to the Whitney Foundation Collection.

Connecting the URIs of archival documents to constituent URIs may be somewhat of a challenge, however. CIDOC’s event structure could be applied to archival documents in a similar way to how it is applied to art objects (http://ceur-ws.org/Vol-1117/paper5.pdf). CIDOC’s P129 is about (is subject of) could be used to connect various types of E73 Information Object(s) (ie archival documents) to constituents (as per the Carnegie Provenance Model). This may be better left to a later stage in the project when object and constituent URIs have already been created, but it would be an interesting challenge. Connecting art collection items to archival documents, which was one of the primary goals of me and Victoria’s Art Documentation project this past fall, is something that continues to interest me, and it would interesting to explore how linked data could facilitate these connections.

Querying the Smithsonian American Art Museum’s Data

The SAAM doesn’t have an API, but it does have a SPARQL endpoint.

Unfortunately, the SAAM’s person/institution URIs do not include a literal value for the person’s name, meaning I can’t use SPARQL to search for the names of Whitney constituents. These URIs also don’t include any links to outside name authorities like ULAN, VIAF, or Wikidata. This seems like a pretty bad linked data model, as it makes it difficult to connect constituents in the SAAM collection to non-Smithsonian resources.

Given these limitations, I’m going to try manually scraping the SAAM’s Browse Collections to see if I can extract any useful URIs.

Meeting With Alison

I’m now set up with access to the Opendata Server via phpMyAdmin. Not set up with access in MySQL Workbench yet, but can upload/download via online CMS.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

January 12, 2017

by amanda

Read Time:59 Second

Wikidata and SNAC

At the most recent Linked Jazz meeting, Karen noted that Wikidata now has Social Networks Archival Context (SNAC) IDs for some URIs: (https://www.wikidata.org/wiki/Q188969)

Not sure how many Whitney constituents would have Wikidata URIs/SNAC URIs, but this might present some interesting enrichment opportunities.

Guy Pène du Bois, for instance, has a SNAC URI: http://socialarchive.iath.virginia.edu/ark:/99166/w6pv7c36

The Whitney Studio Club: http://socialarchive.iath.virginia.edu/ark:/99166/w6cz7999

Gertrude Vanderbilt Whitney: http://socialarchive.iath.virginia.edu/ark:/99166/w6805436

SNAC is still a prototype however, and its data seems to come from VIAF, the LC, and WorldCat.

Wikidata also doesn’t seem to have many SNAC links incorporated yet. Guy Pene Du Bois has a SNAC ID, but his Wikidata page does not link to it.

Refining Data

This Google Sheets/OpenRefine tutorial was also mentioned in the Linked Jazz meeting: http://blog.silk.co/post/127234807482/from-ombd-to-gender-data-on-film-directors-how-to

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

January 13, 2017

by amanda

Read Time:27 Second

TMS Update Issues

The Whitney has just activated the 2016 version of TMS (upgraded from 2012). Due to my lack of admin privileges, I cannot uninstall the old version of TMS and replace it with the new. I don’t use TMS that frequently, but this issue is probably worth fixing nevertheless.

To Work On

Set up joint meeting w/Cristina and Matt

Combine my and Josh’s Constituent sheets

Draft recommendation on name authorities to include in Whitney LOD (based on querying, Smithsonian, etc).

Create some kind of visualization (with Tableau?)

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

January 20, 2017

by amanda

Read Time:2 Minute, 2 Second

Meeting to Touch Base on the Project

At Cristina’s suggestion, I used Doodle to try to schedule a meeting with everyone.

She also suggested everyone could Skype rather than meeting in person. Given everyone’s conflicting schedules, and that Prof. Pattuelli will be at the ARLIS Conference in New Orleans at the beginning of February, this may be a good solution.

Data Cleaning and Normalization

My laptop, on which I’ve been querying Python, and on which I have the OpenRefine desktop client installed, is unfortunately being repaired today, but I’m planning to do some work on tabular data today.

I’m not sure if my user privileges on the Whitney’s computers will enable me to install OpenRefine on my desktop computer here, but if nothing else I can work in Google Sheets.

Unfortunately, it is looking like my work is going to be limited to Google Sheets today.

Object/Constituent Relations in Joshua’s Database

Initially, I was a little confused how the two tables (Objects and Constituents) in Joshua’s MySQL database were related to each other, as their SQL doesn’t indicate any foreign keys. Constituent ID seems like it would be a natural foreign key in the Object table, for example.

As it turns out, Joshua used a PHP script to create a join:

<?php
$query = “SELECT objects.*, people.constituentID FROM `objects` JOIN people ON people.displayName = objects.Artist WHERE `objectID` =”.$objectID;
$result = $conn->query($query);
if (!$result) {
die($conn->error);
} else {
$row = $result->fetch_assoc();
};
// print_r($row);
$name = $row[‘Title Sort’];
$creator = $row[‘Artist’];
$artform = $row[‘artform’];
$artMedium = $row[‘artMedium’];
$artworkSurface = $row[‘artworkSurface’];
$spatial = $row[‘Dimensions’];
$dateCreated = $row[‘Date’];
$accrualMethod = $row[‘Credit Line’];
$constituentID = $row[‘constituentID’];
?>

I’m not really familiar with PHP, so I don’t understand the rationale of doing this join with PHP versus making the tables relational with primary/foreign keys in SQL.

More on generating JSON from a MySQL database using PHP:

http://www.kodingmadesimple.com/2015/01/convert-mysql-to-json-using-php.html

Indexing a Generated Column to Provide a JSON Column Index:

https://dev.mysql.com/doc/refman/5.7/en/create-table-secondary-indexes.html#json-column-indirect-index

Summary Overview of using MySQL or PostgreSQL as a triple store:

http://rdfextras.readthedocs.io/en/latest/store/mysqlpg.html

One random note – I somehow didn’t realize that PURL stands for Persistent uniform resource locator

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

January 23, 2017

by amanda

Read Time:2 Minute, 56 Second

Meeting With Everyone

I’m still waiting on a response from Matt regarding a meeting with everyone to discuss the Fellowship. I will ask him about it at the next Linked Jazz meeting on Thursday.

Tabular Data Cleaning

I made some good progress on Friday in working on a master spreadsheet with all Object and Acquisition-related constituents for the Founding Collection.

One issue I noticed in looking at Joshua’s data were three suites of prints in the founding collection (31.694.1-10, 33.83.1-6, 34.37.1-6). The three suites only had one Object ID per suite, despite the fact that they are separate objects with separate creators.

Additionally, there were a smattering of artists with work in the Founding Collection who were not listed in Joshua’s Artist Data spreadsheet, and who did not have Whitney IDs created for them.

Namespace/URI Considerations

I’ve started by trying to map Joshua’s Object-related constituents unto a CIDOC event structure.

CIDOC’s official URIs lead to dead links, despite an apparent plan by the Internal Council of Museums to implement redirects .

CIDOC does have the schema available as an RDFs file. I need to do more research on RDF files as a persistent namespace, but that could be an option.

“ A source of honest confusion, however, is that RDF can be expressed as XML. Lassila’s note regarding the Resource Description Framework specification from the World Wide Web Consortium (W3C) states, “RDF encourages the view of ‘metadata being data’ by using XML (eXtensible Markup Language) as its encoding syntax.”4 So even though RDF can use XML to express resources that relate to each other via properties, identified with single reference points (URIs), RDF is itself not an XML schema. RDF has an XML language (sometimes called, confusingly, RDF, and from here forward called RDF/XML). Additionally, RDF Schema (RDFS) declares a schema or vocabulary as an extension of RDF/XML to express application-specific classes and properties.5 Simply speaking, RDF defines entities and their relationships using statements. There are various ways to make these statements, but the original way formulated by the W3C is using an XML language (RDF/XML) that can be extended by an additional XML schema (RDFS) to better define those relationships. Ideally, all parts of that relationship (the subject, predicate, object, or the resource, property, property value) are URIs pointing to an authority for that resource, that property, or that property value.”

Hardesty, J. j. (2016). Transitioning from XML to RDF: Considerations for an Effective Move Towards Linked Data and the Semantic Web. Information Technology & Libraries, 35(1), 51-64. http://search.ebscohost.com.ezproxy.pratt.edu:2048/login.aspx?direct=true&db=llf&AN=114479090&site=ehost-live

For now, I’m following suite with the British Museum and using the Erlangen mapping of CIDOC to OWL for CIDOC class namespaces.

Erlangen’s URIs, however, just initiate a download of a namespace file with the whole schema, rather than leading to URIs for individual classes and properties.

Another solution I’ve considered is using Dublin Core terms to fill in for CIDOC classes/properties, as Dublin Core does provide persistent URIs for terms in its ontology.

There’s also this OWL ontology for provenance:

https://www.w3.org/TR/prov-o/#description

http://openorg.ecs.soton.ac.uk/wiki/Linked_Data_Basics_for_Techies

Another XML mapping:

http://ws.nju.edu.cn/falcons/ontologysearch/details/recommendation.jsp?id=13022081

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

January 31, 2017

by amanda

Read Time:2 Minute, 52 Second

Data Refinement

I am continuing to work on cleaning the Whitney’s constituent data in OpenRefine and Google Sheets.

I started looking into Google Fusion Tables, which seems like a good alternative to Tableau as a geocoding application.

Possible source for data cleaning/reconciliation info:

http://freeyourmetadata.org/

As per this discussion (https://groups.google.com/forum/#!topic/openrefine/GwCGTM3NGOQ), there is apparently a version of OpenRefine specifically designed for Linked Data:

https://sourceforge.net/projects/lodrefine/?source=navbar

I also borrowed an e-Edition of this book:

http://book.freeyourmetadata.org/

http://search.ebscohost.com.ezproxy.pratt.edu:2048/login.aspx?direct=true&db=nlebk&AN=969817&site=ehost-live

Notes from Hooland, S. v., & Verborgh, R. (2014). Linked Data for Libraries, Archives and Museums : How to Clean, Link and Publish Your Metadata. London: Facet Publishing:

Page 23: The authors recommend one table per entity for a linked data database, as per basic relational database guidelines. In my database, I’m thinking of Events (artwork creation, donation, purchase, etc) as entities. Indeed, in CIDOC, the E5 Event class is a sub-sub-sub-subclass of E1 Entity.

Any CIDOC class could potentially be its own table by this thinking.

My current tables correspond to the following CIDOC Entities:

E5 Event
E22 Man-Made Object
E39 Actor

Subtypes of events could potentially be their own tables:

E12 Production
E8 Acquisition Event

Role E55 Type could be its own table, as the same actor may have multiple roles in relation to the same object

The Role or Event tables could basically be like associative tables (https://en.wikipedia.org/wiki/Associative_entity)

Since Joshua has artist birth and death dates and location:

E67 Birth
E69 Death
E53 Place

Basically, with CIDOC, I should think of things in terms of Class=Table and Property=Column, to represent things with a relational structure that makes sense.

Or rather, each table is a CIDOC Class (AKA entity). Within tables, each Property is a column. These columns are populated by Classes, unless there exists a many-to-many relationship between the Class entities, in which case they get their own table.

Subject = Table

Predicate = Column

Object = URI or literal

As of the latest release of CIDOC issued this month, the E82 Actor Appellation Class has been deprecated in favor of the generic E41 Appellation

CIDOC property P48 has preferred identifier (is preferred identifier of) should define the primary key of any given table

For now, I am going to focus on making sure all my table rows conform to a CIDOC property

The CIDOC Class Hierarchy: http://cidoc-crm.org/cidoc_graphical_representation_v_5_1/class_hierarchy.html

Database Setup

I decided to join METRO, through which I’m hoping to get access to Lynda.com and take some online courses in database management.

Specifically, I’m interested in this course (https://www.lynda.com/NoSQL-tutorials/NoSQL-SQL-Professionals/368756-2.html), which is focused on NoSQL for people with SQL backgrounds, and which touches specifically on the management of museum data.

There’s also this course as well: (https://www.lynda.com/NoSQL-tutorials/Up-Running-NoSQL-Databases/111598-2.html)

METRO only has 8 licenses, however, and I’m not sure what the turnaround time for these is, so I may just sign up for Lynda independently.

UPDATE – Pratt apparently provides access to Lynda as well:

http://libguides.pratt.edu/lynda

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

February 1, 2017

by amanda

Read Time:10 Second

End Project Deliverable

Interoperability w/ SAAM – what resources does SAAM have that Whitney doesn’t?

SAAM URIs for artists tend to have photos – do something with photos?

Actors

Distinguish between Person and Corporate Body

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

February 7, 2017

by amanda

Read Time:1 Minute, 0 Second

Data Modeling and Normalization

I’m continuing the work I started last week, focused on normalizing my and Joshua’s combined data by splitting each CIDOC entity class into its own sheet (which will be the basis of a MySQL table eventually)

As I work on the data in Google Sheets, I am creating an ER Diagram in MySQL Workbench to keep track of what data is in what sheet/table:

Linked Data Encoding

Hooland, S. v., & Verborgh, R. (2014), p. 29 mentions the Open Graph protocol:

http://ogp.me/

Hooland, S. v., & Verborgh, R. (2014) touch on the virtues of XML vs. JSON for encoding linked data. Joshua opted for JSON encoding in his work, and Prof. Pattuelli seems to prefer it as well.

According to Hooland and Verborgh, p. 42-43, data exchange on the internet occurs mostly in JSON, and JSON has an inherently hierarchical structure.

Regardless of whether JSON or XML is used to encode linked data, outside users have no idea what elements mean without the existence of a namespace.

Hooland and Verborgh are in favor of Turtle as an encoding syntax (p. 47)

Hooland, S. v., & Verborgh, R. (2014), p. 52

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

February 14, 2017

by amanda

Read Time:4 Minute, 3 Second

CIDOC Namespaces

Fortunately, CIDOC seems to have finally added back pages for its individual entity classes and properties:

http://www.cidoc-crm.org/Version/version-6.2

Unfortunately, I don’t know if these pages can be considered persistent URIs, since each entity and property page is bookended by “/version-6.2”, and pages will not redirect without it:

http://www.cidoc-crm.org/Entity/e2-temporal-entity/version-6.2

http://www.cidoc-crm.org/Property/p118-overlaps-in-time-with/version-6.2

Still, it is helpful at least in the conceptual mapping of the Whitney’s data to be able to quickly view the properties that can be applied to each entity class, and to view subclasses and superclasses of entities, without having to manually scroll through a huge PDF.

Separating Entities

I finally finished separating data for each CIDOC Entity class with a many-to-many relationship with another into its own table/Google Sheet. I went pretty granular, ending up with 14 different sheets:

E8 Acquisition

Used for non-purchase acquisition events.

E12 Production

Used to represent the creation of an artwork.

E22 Man-Made Object

(this could be problematic if used outside the context of the Founding Collection for newer items that don’t have a physical component)

Used to represent art objects in the Whitney’s collection.

E39 Actor

Used to represent the unique Whitney identifier for any kind of Constituent (Object, Acquisitions, and Ex-Collections-related).

E42 Identifier

Used to record non-Whitney unique identifiers for Constituents and the name authority domains to which they belong (currently LC, VIAF, ULAN, Wikidata; potentially the Smithsonian American Art Museum in the future).

E51 Contact Point

Used to record external URIs for Constituents on name authority sites.

E53 Place

Used to record the names and GeoNames IDs for the birth and death locations of artists. Could also be used for any other place identifiers like object locations.

E54 Dimension

Used to record dimensions of objects. Objects may have more than one listed dimension in TMS (framed vs. unframed, inches vs. cm), hence the need for a separate table.

E55 Type

Used to record role type (artist, donor, art dealer, etc) of an Actor in a given event. Vocabulary for these roles is from the Getty AAT.

E57 Material

Used to record the material components of art objects.

E67 Birth

Used to record Constituent birth dates and locations.

E69 Death

Used to record Constituent death dates and locations. Also includes links to the NYTimes Obituaries Joshua found.

E82 Actor Appellation

(deprecated in the latest version of CIDOC still in development [Version 6.2.2] in favor of the broader E41 Appellation)

Records the Display Names of Constituents along with any alternative forms.

E96 Purchase

(a new Event class being added in CIDOC Version 6.2.2)

Used to record information about the purchase of objects.

I could definitely simplify this conceptual model at a later point, but I figure that since the main audience for this data would be art historical researchers, maybe more specificity would be beneficial.

Neo4j

Neo4j apparently allows you to import CSV files directly and make table joins with their SQL-like query language called Cypher:

https://neo4j.com/developer/guide-importing-data-and-etl/

Given that, I may just import my CSV data directly into Neo4j after cleaning it in OpenRefine rather than bothering to make a MySQL database first. Starting with an ER diagram may be helpful, however.

Neo4j is also designed as more of a “big data” database.

CouchDB

CouchDB is another popular NoSQL system I see mentioned a lot.

CouchDB doesn’t support SPARQL either, however, instead relying on a JSON-based query language

Blazegraph

The Metaphacts database used by the Florentine Renaissance Drawings project is built on top of a Blazegraph database.

Blazegraph is available as an open-source download. It also supports SPAQRL queries.

It does seem somewhat harder to set up than something like Neo4j, however.

Update: I installed Blazegraph easily, and it has a straighforward local interface. However, I’m not really sure what the applicability of it is, if any.

OrientDB

At first glance, OrientDB seems like it might have an easier to use interface. I will download it and give it a shot.

http://orientdb.com/download/

OrientDB took some time to install. You have to install a server and console client locally to use it:

http://orientdb.com/docs/last/

At first glance, OrientDB seems more familiar and usable than previous graph databases I’ve looked at.

You can define Classes, for example (http://orientdb.com/docs/last/)

I’m not really sure if and how OrientDB supports linked data, however, which is obviously an issue

DB Selection – Options

The number of database options is kind of overwhelming, quite frankly.

This is a helpful overview:

http://db-engines.com/en/ranking/rdf+store

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

February 16, 2017

by amanda

Read Time:1 Minute, 32 Second

Omeka as Publishing Platform

As per this upcoming Museums and the Web workshop (http://mw17.mwconf.org/proposal/innovative-applications-and-data-sharing-with-linked-open-data-in-museums-exploring-principles-and-examples/), Omeka apparently supports RDF exporting in some form.

I’ve never used Omeka, but will be using it in one of my courses later this semester. I’m not sure how viable it is for publishing linked data, but it’s worth exploring.

Also interesting:

http://mw17.mwconf.org/proposal/thinking-in-cidoc-crm/

Meeting w/Cristina and Matt

CSV -> RDF Lib

Deliverable – report on how to model provenance

Gephi

Art dealers -> starting point for other connection

As Maggie more about provenance; where do they get their provenance info?

Don’t stress about databases

Opportunity to question how museum does things

Movement of objects over 10 year span

Provenance in different schemas (Dublin Core, etc)

Interview people

Method of provenance

In Summary

Realistically, it’s not really within the scope of the fellowship (or my technical abilities) to try to implement a database like I’ve been hung up on.

A good goal for the project would be to present a few examples of what the Whitney’s linked data could be, and to outline the methodology of how I came up with these model(s).

As far as a visual/deliverable, a Gephi network graph is always a good option.

Additionally, if I’m seeking to narrow down what elements I want to incorporate into the Whitney’s dataset, it might be helpful to talk to someone who works with the museum’s collection files (ie Maggie) or someone in the curatorial department to get a better sense of what the museum’s needs are, and how linked data could assist researchers and museum staff in accessing the information they need.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

February 22, 2017

by amanda

Read Time:44 Second

Getty Provenance Info

The Getty has the sales records of Knoedler Gallery available as a CSV file on Github. They also have a ton of other provenance tools:

Overview of provenance-related datasets and search tools at the Getty:

http://www.getty.edu/research/tools/provenance/search.html

The Getty’s Github repository, where they plan to eventually make other provenance datasets available:

https://github.com/gettyopendata/provenance-index-csv

http://www.getty.edu/research/tools/provenance/faq.html#download

A Gephi-style network diagram:

http://www.getty.edu/research/tools/provenance/zoomify/index.html

http://piprod.getty.edu/starweb/collectors/servlet.starweb?path=collectors/collectors.web

All collectors in above source seem to have ULAN URIs

Getty Provenance Index Remodel Project – started late last year, they’re eventually aiming to publish everything as linked data:

http://www.getty.edu/research/tools/provenance/provenance_remodel/index.html

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

February 24, 2017

by amanda

Read Time:3 Minute, 35 Second

Getty’s Knoedler Dataset

I’m thinking it might be interesting to compare the Getty’s Knoedler dataset with the Whitney’s. While none of the Whitney Founding Collection objects came from Knoedler, they were an influential 19th-early 20th century gallery, and sold to Gertrude Vanderbilt Whitney’s father/grandfather(?) Cornelius (http://www.artnews.com/2016/04/25/the-big-fake-behind-the-scenes-of-knoedler-gallerys-downfall/), among other influential robber barons of the time. I imagine there might be some overlap between artists or collectors in the two datasets, if nothing else.

Strangely, a cursory search of TMS reveals that Knoedler Gallery has only one related object in the database (a Jasper Johns artist book, ID 84.52)

The Getty’s provenance resources also skew heavily towards pre-20th century sources (presumably due to copyright issues) and records from European auction houses.

The Knoedler sales books do encompass the years the Whitney Founding Collection was amassed, as well as the decades before and after, so hopefully the Getty’s Knoedler dataset has some kind of linkage with the Whitney’s.

As the readme in the Getty’s Github repository notes, the Carnegie Museum also has their collection data available in both CSV and JSON format:

https://github.com/cmoa/collection

To go about connecting the Whitney’s dataset to the Getty’s and/or Carnegie’s, I would have to use a similar to what Hannah and Molly did for their Program for Cultural Heritage project:

https://github.com/MollieEcheverria/CH-LJ/blob/master/README.txt

I might need to query for URIs for the Getty/Carnegie names first.

OpenRefine

Before I get into external datasets, I am going to start working with the Whitney’s data using OpenRefine.

I’m using this book as a reference:

Verborgh, R., De Wilde, M., & Sawant, A. (2013). Using OpenRefine : The Essential OpenRefine Guide That Takes You From Data Analysis and Error Fixing to Linking Your Dataset to the Web.Birmingham, England: Packt Publishing. Retrieved fromhttp://search.ebscohost.com.ezproxy.pratt.edu:2048/login.aspx?direct=true&db=nlebk&AN=639455&site=ehost-live&ebv=EK&ppid=Page-__-20

A preliminary look at the RDF Refine extension for OpenRefine (http://refine.deri.ie/) is promising. I tried plugging in the BloodyBite CIDOC mapping namespace I think I mentioned previously (http://bloody-byte.net/rdf/cidoc-crm/core_5.0.1#) to get CIDOC classes/properties:

Sidenote: this namespace lookup resource looks helpful as well:

http://prefix.cc/

The Erlangen OWL mapping used by the British Museum works well too:

Since I have all the columns in my Google Sheets aligned with CIDOC properties, the RDF schema alignment process should hopefully be pretty easy.

This plugin lets you identify entity nodes as well, so I could probably just simplify things and recombine all my separate sheets into one CSV file rather than having one sheet per entity:

https://en.wikipedia.org/wiki/Node_(computer_science)

https://en.wikipedia.org/wiki/Linked_data_structure

OpenRefine also does Wikidata reconciliation!

You can reconcile against SPARQL endpoints too! (https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources)

Adding name reconciliation sources: http://iphylo.blogspot.com/2012/02/using-google-refine-and-taxonomic.html

You can even generate RDF/XML or Turtle files from OpenRefine (no JSON-LD, sadly)

Publishing these RDF files on GitHub might be a simple preliminary way to share the Whitney’s data

Or, I could use something like this (https://github.com/semsol/arc2/wiki), though that might be again getting hung up on publication/databases.

It’s a bit tricky in practice:

http://networkedplanet.com/blog/2015/07/13/turning-flat-data-into-semantic-data-with-open-refine-and-the-rdf-extension.html

In working with OpenRefine, I realize all my spreadsheets are a little out of control in terms of granularity.

Revised Project Plan

Recombine spreadsheets into three main sheets (Constituent, Object, Event)

Do name reconciliation/RDF schema layout stuff in OpenRefine

Feed sheet into Gephi and make visualization.

Also spit out some RDF/XML files and make into an N-Triple/JSON-LD

http://rdf-translator.appspot.com/

Clean and reconcile Getty/Carnegie data with OpenRefine too and try to make some connections.

Make a Gephi visualization mapping the connections

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

February 26, 2017

by amanda

Read Time:6 Second

Getty Provenance Index as LOD

To watch: a video discussing the Getty’s linked data initiative: https://www.youtube.com/watch?v=1HRbP4zjqPM

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

February 28, 2017

by amanda

Read Time:4 Minute, 33 Second

Fast Forward: Painting from the 1980s

I had the opportunity to attend a curator-led staff tour of one of the Whitney’s current exhibitions before lunch. Not directly related to project work, but interesting to get some context on the work in the show and trends of the era.

Standardizing Museum Provenance for the Twenty-First Century

Prof. Pattuelli notified me of a livestreaming talk on 02/27/17 at the Yale Center for British Art about the Carnegie Museum Provenance project.

Newbury, D. (2017 February 27). Standardizing Museum Provenance for the Twenty-First Century. New Haven, CT: Yale Center for British Art. Retrieved from https://youtu.be/YKJqINwZ–o

David Newbury, who gave the talk, is the lead project developer of the ArtTracks project at the Carnegie, which had been ongoing for the past 3.5-4 years

The Carnegie project has taken much longer than Newbury had anticipated. Coming from an animation and data visualization background, Newbury guessed the project would only a matter of weeks or months, not realizing the complexity of provenance data.

The Carnegie’s Use and Rationale for Linked Data

The Carnegie Museum’s linked data was not necessarily envisioned as linked open data.

The Carnegie does, however, recognize the need for sharing data across institutions.

What is the value of museums sharing their data with the public?

Are museums:

Publishers? (Yes)
Researchers? (Yes)

Fundamentally, however, museums are collectors.

Despite earlier hopes of researchers, linked data has not been successful in enabling web-scale AI (i.e. it hasn’t made the internet machine-readable).

Nor does it enable interoperability/easier collaboration.

Nor does it automate reconciliation/reduce workload.

Linked data is one of multiple ways of potentially representing data, each with its own pros and cons.

Aim of Carnegie project was to standardize provenance data so it could be presented as:

Linked data
JSON
Text
End goal: shared online scholarship

Need to preserve the nuances of provenance data contained within text

Four Advantages of Linked Data

Advantage One: Allows for linking to other authorities

Linking to outside authorities lends the museum itself authority (i.e. certifies that museum is providing accurate information to patrons).

Name authorities allow museums to provide authoritative info without having to expend money on research

Name authorities = Museum saves money!

A museum is best suited to asserted authority – being an authority on the objects in its own collection, its own exhibitions, and other events that have taken place during the course of the museum’s existence.

Being an authority on things not within the sphere of an individual institution and its collection are better delegated to other sources. This delegated authority can be covered by various name authority sources (VIAF, ULAN, etc).

Reluctant authority – when you have to be the authority on a subject for which there are no authority records. For example, obscure constituents that only the curatorial department knows about, i.e. random art galleries from the 1930s.

As a reluctant authority, you are taking the reigns as an authority until someone else publishes more definitive information.

“Temporary authority held out of necessity, not desire”

Carnegie Tools

https://github.com/arttracks

museum_provenance (https://github.com/arttracks/museum_provenance):

“The museum_provenance library is the core technology developed for this project. It takes provenance records and converts them into structured, well-formatted data.”- http://www.museumprovenance.org/

Tool for parsing provenance relationships from text fields (such as TMS Provenance field). For my purposes, probably more useful at later stage of project.

Elysa (https://github.com/arttracks/elysa):

“The Elysa tool is a user interface designed for museum professionals. It assists with verifying, cleaning, and modifying provenance records.”

It’s a GUI for extracting provenance information from text fields

MicroAuthority (https://github.com/arttracks/microauthority): Linked open data publication tool developed for smaller institutions looking to create URIs for constituents who don’t have records on any name authority sites. Given how obscure many of the Acquisition-Related constituents in the Founding Collection are, creating URIs for these people could be a goal of my project.

provenance_interactive (https://github.com/arttracks/provenance-interactive): Tool for creating visualizations of provenance information. Probably not too exciting for the Whitney’s purposes, since everything in the Foundation Collection is American and made within the last couple centuries.

Baring_art_sales (https://github.com/arttracks/baring_art_sales): Data on the purchases of the Baring family in CSV and JSON form. Doesn’t really extend to the Whitney Founding Collection era (latest dates 1917), but interesting because it employs the Carnegie’s Acquisition Method Vocabulary, an ontology created for the project.

https://github.com/whosonfirst-data/whosonfirst-data

Acquisition Method Vocabulary

The Carnegie SKOS (http://www.museumprovenance.org/acquisition_methods.ttl) for their ontology seems mostly focused on very specific acquisition methods. It seems like it would make more sense as supplement to CIDOC than as a stand-alone conceptual model.

Ruby on Rails

All of the Carnegie’s apps are built on Ruby. To install it, I followed these instructions: http://railsapps.github.io/installrubyonrails-mac.html

Gems in Ruby: http://guides.rubygems.org/what-is-a-gem/

Ruby is…time consuming to install

Trying to install the Getty’s Elysa app, but it keeps using an older, incompatible version

Eventually uninstalled/reinstalled Ruby, but was still unable to run Foreman server

OpenRefine

The localhost IP address for OpenRefine, for future reference: http://127.0.0.1:3333/

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

March 7, 2017

by amanda

Read Time:1 Minute, 34 Second

Installing Elysa

I spent much of last week attempting unsuccessfully to install the Carnegie’s Elysa tool. Elsya has not been updated in almost two years, and I’m having issues installing Foreman, the process manager it runs on.

It has been a learning process in general working with the Carnegie’s tools, as they are all built with Ruby, a language I have not previously worked with.

In general, the Carnegie’s tools seem to require a high degree of technical skills to use. Given that one of the Carnegie’s stated goals with the ArtTracks project is to make these tools freely available to institutional collaborators, and that they are meant to facilitate interoperability, I feel like the complexity of their use and installation kind of limits their applicability of use to institutions with robust IT departments. I can’t really see someone in the curatorial department of a smaller museum being able to install Elysa, though the Carnegie does state they will provide technical support.

Karma vs. OpenRefine RDF Plugin

Having played around with OpenRefine, I want to test whether Karma (http://usc-isi-i2.github.io/karma/) alone might allow for sufficient normalization to skip the step of OpenRefine.

I first separated all the different sheets in my master Google Sheet for the Founding Collection using this script:

https://www.drzon.net/export-all-google-sheets-to-csv/

MySQL Workbench

I’m trying to create a MySQL database to import into Karma using my CSV files. Updating to the latest version of MySQL Community Server, however, has proven problematic.

MySQL

Most of my afternoon was spent importing individual CSV files into MySQL and connecting them relationally.

I also did end up having to do some work with OpenRefine.

I’m hoping to eventually feed this database directly into Karma, as working with so many separate CSV files is rather confusing.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

March 17, 2017

by amanda

Read Time:3 Minute, 39 Second

Data Work

I’m still at it with the tabular data work. I spent the morning working alternately with MySQL Workbench and OpenRefine, trying to refresh my knowledge of SQL and regular expressions in the process.

The amount of time I’ve been spending on this normalization work is making me wonder whether I should just try feeding the data directly from TMS into Karma. Throughout this project, I feel like I’ve tended to get distracted and waste time whenever databases are concerned.

Realistically, if the Whitney wants to eventually publish all of its collection data as LOD, the museum is probably not going to have the resources to go through all this normalization work for every record. Importing data directly from TMS to Karma would allow the Whitney to publish at least some bare-bones linked data on Github (a la the Getty) with relatively little time and resources expended.

This is a super-simplified example of how provenance could be modeled with CIDOC used data straight from TMS:

Reconciliation

As per this listserv post (http://si-listserv.si.edu/cgi-bin/wa?A3=ind1509&L=TMSUSERS&E=quoted-printable&P=1667361&B=–001a113a6aaa3737ea051eee5830&T=text%2Fhtml;%20charset=UTF-8), I am going to try Reconcile-CSV (http://okfnlabs.org/reconcile-csv/), an OpenRefine plugin, as the RDF plugin for OpenRefine seems to have issues with some reconciliation services.

Rebuilding the Getty Provenance Index as Linked Data

http://backup.cni.org/topics/digital-humanities/rebuilding-the-getty-provenance-index-as-linked-data

https://www.youtube.com/watch?v=1HRbP4zjqPM

The Getty have tried both Karma and 3M, but have not settled on a favorite solution.

Joshua Gomez from the Getty likes Karma because of its ability to generate graphs

Screenshots of the Getty Model:

Interesting that the Getty uses E10 Transfer of Custody instead of E8 Acquisition. I talked to Prof. Pattuelli about this issue this past week, as she suggested using E10. CIDOC says:

Of E10:

“The interpretation of the museum notion of “accession” differs between institutions. The CRM therefore models legal ownership and physical custody separately. Institutions will then model their specific notions of accession and deaccession as combinations of these.”

Of E8:

“Ιt may also describe events where a collector appropriates legal title, for example by annexation or field collection. The interpretation of the museum notion of “accession” differs between institutions. The CRM therefore models legal ownership (E8 Acquisition) and physical custody (E10 Transfer of Custody) separately. Institutions will then model their specific notions of accession and deaccession as combinations of these.”

Since provenance is, by definition, “a record of ownership of a work of art or an antique, used as a guide to authenticity or quality.”, I feel that E8 is the more appropriate CIDOC entity class for this, as it specifically records legal ownership as opposed to physical possession. E10, could, however, also be included in a linked data model. This would probably require data from the Accession Sheet field in TMS.

According to Gomez, CIDOC’s recent expansion of purchase-based entity classes was precipitated by the Getty project.

Due to the slowness of triple-stores, Gomez decided to put the Getty’s LOD in Elasticsearch (https://github.com/elastic/elasticsearch). This allows for REST API searching. Less complicated than a SPARQL endpoint.

Gomez mentions a data ingest platform called Arches (http://archesproject.org/) being developed by the Getty Conservation Institute (http://www.getty.edu/conservation/our_projects/field_projects/arches/)

Arches is built to catalog immovable cultural heritage (ie sites, buildings)

Data Work Continued

I started playing around with the Gift constituents CSV file in Karma, but realized I need to do more OpenRefine work before import

I switched over to LODRefine from standard OpenRefine, as its capabilities seem to be slightly beyond the OpenRefine RDF plugin

After some exploration, plain OpenRefine works best. LODRefine is heavily reliant on deprecated services like Freebase, unfortunately.

OpenRefine’s reconciliation service makes querying for external URIs with Python pretty much unnecessary.

This guide (https://data-lessons.github.io/library-openrefine/05-advance-functions/) details how to split those reconciled URIs into their own columns.

I did the Gift Constituents list as a sample, and exported it as RDF/XML to see what intake into Karma would be like. I think uploading as tabular data might work better for Karma.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

March 21, 2017

by amanda

Read Time:13 Second

Project Status

I’m planning to meet with Farris on Friday to touch base on the project

Gephi

Nodes vs. edges: http://www.touchgraph.com/assets/navigator/help2/module_7_1.html

Importing CSV data: https://github.com/gephi/gephi/wiki/Import-CSV-Data

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

March 23, 2017

by amanda

Read Time:15 Second

OpenRefine Name Entity Reconciliation

Karen Hwang from METRO/Linked Jazz just published an extremely helpful article on her own name reconciliation work.

Karen’s article:

http://www.mnylc.org/fellows/2017/03/17/using-openrefine-to-reconcile-name-entities/

Karen’s scripts:

https://github.com/kllhwang/Named-Entity-Reconciliation-with-OpenRefine

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

March 24, 2017

by amanda

Read Time:2 Minute, 31 Second

Status of the Project as of March 24th

I am currently in the process of exploring three tools:

OpenRefine, specifically name reconciliation and the RDF plugin
Karma
Gephi

My current challenges and end goals:

OpenRefine
- I have been used OpenRefine to clean up ingest data.
- I am also using it for name entity reconciliation, and have successfully reconciled names to Wikidata, VIAF, and LCSH.
- My end goal, besides the normalization, would be to figure out how to reconcile names in the Whitney’s collection to other art museums with SPARQL endpoints, such as the Smithsonian American Art Museum.
Karma
- I have been exploring Karma as a tool for mapping the Whitney’s Data to the CIDOC CRM.
- OpenRefine can be used for mapping as well, and is somewhat easier to use than Karma, but it has limited options for RDF export (those being RDF/XML and and Turtle).
- Karma supports the export of JSON-LD. My end goal would be to export the Whitney’s data mapped to CIDOC in this format.
- Have tried to work with CSV data in Karma, I’m currently creating a MySQL database with my Whitney spreadsheets to see if this is a more effective intake format.
- If successful, this could serve as a model for ingesting the Whitney’s data directly from TMS into Karma in the future.
Gephi
- Having discovered that sections of the Getty Provenance Index and the Carnegie Museum of Art’s collection provenance are both available on GitHub as CSV files, I am currently starting to explore Gephi, in the hopes of using it to create a visualization of the overlap between these provenance sources and the Whitney Founding collection, using Molly Reese-Lerner & Hannah Sistrunk’s project (http://pfch.nyc/linked_jazz_meets_carnegie_hall/index.html) in the Programming for Cultural Heritage class as a model.
- I have been experimenting with working with CSV data in Gephi, but have found that I would need to create RDF files first to connect the datasets
Tableau
- As Gephi is rather technical, I started working with the Whitney, Getty, and Carnegie datasets in Tableau.
- Tableau maybe be a good visualization tool, but I’ve realized I need to do more normalization work before using it.

OpenRefine

I’ve decided to try cleaning up the Carnegie and Getty data in OpenRefine and to try reconciling some of the names in it.

I’m starting with the Carnegie’s Baring art sales data (https://github.com/arttracks/baring_art_sales) , which is kind of a mess. The seller and artwork of these pieces are both in one column, for instance, which is not very helpful for provenance. The names of the actual sellers are also often pretty vague (“Sir Addington”, etc).

In reconciling the Getty provenance data, I learned that Wikidata links to a bunch of institutional URIs for artists, including the Smithsonian and British Museum!

https://www.wikidata.org/wiki/Q704868

Karma

After some difficulty, I’m finally getting the hang of mapping with Karma.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

March 28, 2017

by amanda

Read Time:4 Minute, 7 Second

Notes from Meeting With Farris

OpenRefine – maybe just go with that if easier
Farris’ notes from provenance conference
What other data sets
Goals: Combine as many datasets as possible.
How to visualize – data in action
Yearly accession data
Timeline
External resources
Incorporating external data most important at this point
Create a timeline?
What can be ingested into Whitney server and maintained?

Timeline for the Rest of the Semester

Since Karma is a pain and Gephi is probably beyond the scope of my technical abilities, I will focus on OpenRefine for now.

A nice deliverable would be a master Founding Collection dataset with URIs from as many other institutional repositories as possible.

Basically, just reconcile everything.

Reconciliation

https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation

https://www.wikidata.org/wiki/Wikidata:List_of_properties

https://www.wikidata.org/wiki/Property:P1566

Goal – a Founding Collection Constituents Name Directory! Currently testing to see whether OpenRefine can auto-generate columns based on Wikidata properties. Some references:

Using OpenRefine to search for Wikidata properties is kind of time-consuming. Is it any less so than Python?

And…searching for Wikidata properties with OpenRefine was a failure. Python is probably fine.

Attempting to get GeoNames property from Wikidata via the Wikidata entity page for a place:

https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Q200078&property=P1566

“https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=”+cell.recon.match.id+”&property=P1566”

“https://tools.wmflabs.org/openrefine-wikidata/en/fetch_values?item=”+cell.recon.match.id+”&prop=P1566”

Yes! Once names are reconciled to Wikidata, OpenRefine can create a column based on any property! I used GeoNames to test, since Joshua had already queried it for constituent birth/death places.

Extract property from resulting JSON dictionary:

value.parseJson().values.replace(‘[‘,”).replace(‘]’,”).replace(‘”‘,”)

A name directory with the URIs of Whitney Constituents from various other institutional repositories seems like it could be pretty useful.

More VIAF reconciliation details:

http://refine.codefork.com/

External URI Wikidata Properties

VIAF ID – https://www.wikidata.org/wiki/Property:P214
LCAuth ID – https://www.wikidata.org/wiki/Property:P244
FAST-ID (WorldCat Linked Data) – https://www.wikidata.org/wiki/Property:P2163
Social Networks and Archival Context (SNAC) ID –https://www.wikidata.org/wiki/Property:P3430
ULAN ID – https://www.wikidata.org/wiki/Property:P245
RKDartists (Rijksbureau voor Kunsthistorische Documentatie) ID –https://www.wikidata.org/wiki/Property:P650
Art UK artist ID – https://www.wikidata.org/wiki/Property:P1367
British Museum person-institution – https://www.wikidata.org/wiki/Property:P1711
Musée d’Orsay artist ID – https://www.wikidata.org/wiki/Property:P2268
Photographers’ Identities Catalog ID – https://www.wikidata.org/wiki/Property:P2750
NGA (National Gallery) artist id – https://www.wikidata.org/wiki/Property:P2252
Artsy artist ID – https://www.wikidata.org/wiki/Property:P2042
Smithsonian American Art Museum: person/institution thesaurus id – https://www.wikidata.org/wiki/Property:P1795
Web Gallery of Art ID – https://www.wikidata.org/wiki/Property:P1882
Kunstindeks Danmark Artist ID – https://www.wikidata.org/wiki/Property:P1138
Tate artist identifier – https://www.wikidata.org/wiki/Property:P2741
Dictionary of Art Historians ID – https://www.wikidata.org/wiki/Property:P2332
Te Papa artist ID – https://www.wikidata.org/wiki/Property:P3544
Sikart – https://www.wikidata.org/wiki/Property:P781
Auckland Art Gallery artist ID – https://www.wikidata.org/wiki/Property:P3372
Belvedere artist ID – https://www.wikidata.org/wiki/Property:P3421
MoMA artist id – https://www.wikidata.org/wiki/Property:P2174
KulturNav-id – https://www.wikidata.org/wiki/Property:P1248
Nationalmuseum Sweden artist ID – https://www.wikidata.org/wiki/Property:P2538
J. Paul Getty Museum artist id – https://www.wikidata.org/wiki/Property:P2432
Cooper-Hewitt Person ID: https://www.wikidata.org/wiki/Property:P2011
Thyssen-Bornemisza artist ID – https://www.wikidata.org/wiki/Property:P2431
National Gallery of Victoria artist ID – https://www.wikidata.org/wiki/Property:P2041
Information Center for Israeli Art artist ID – https://www.wikidata.org/wiki/Property:P1736
Artnet Artist ID – https://www.wikidata.org/wiki/Property:P3782
CLARA-ID (women visual artists) – https://www.wikidata.org/wiki/Property:P1615

Export

I guess JSON is the default export format for OpenRefine?

https://github.com/OpenRefine/OpenRefine/wiki/Export-As-YAML

But is it LD….?

You can export JSON from OpenRefine using the Templating function:

More on that: http://stackoverflow.com/questions/31328001/openrefine-working-with-templating-to-export-json-as-records

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

March 29, 2017

by amanda

Read Time:1 Minute, 10 Second

Relevant notes from a Metadata: Description and Access guest lecture by Corey Harper of Elsevier

Unlike Carnegie Arttracks team, not skeptical of linked data as a tool in implementing machine learning.
“Ease of integration across data sources – merging graphs”
ETL – Extract, Transform, Load
Formal definition of how data elements relate to one another
2009 Tim BL TedTalk – linked data publication exploded
2011 W3C LLD Incubator – Encouraged experimental linked data projects
“Don’t just publish data, think about how people are actually going to use it”
A bunch of nodes is meaningless to most people
A Thinkbase/Freebase browsable interface
- Google Knowledge Graph – the little info boxes you see when you search
- They bought Freebase, hence why it disappeared
- No one knows or cares that it’s linked data
RelFinder – http://www.visualdataweb.org/relfinder.php
- Can search for associations between two entities across sources.
- Linked Jazz-style
SNAC – “Facebook for dead people”
Prosopography – the study of an aggregate group based on systematic biographical study of individuals
- https://en.wikipedia.org/wiki/Prosopography
Data in Narrative
- Trove – https://en.wikipedia.org/wiki/Trove (Australian Europeana/DPLA)
- http://discontents.com.au/
Open Context
- https://opencontext.org/
Pleiades
- https://pleiades.stoa.org/
- http://pelagios.org/peripleo/pages/datasets
Books:
- The Second Machine Age
- Weapons of Math Destruction
Argument for linked data in libraries – this data may be less biased than data coming from Google/other corporate source

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

April 4, 2017

by amanda

Read Time:55 Second

The Drawings of the Florentine Painters

Alex Provo’s Florentine Renaissance Painters site just went live. The interface is very user-friendly, and definitely seems like it could be a useful art historical research tool even for those without a background working with linked data.
The graphs showing CIDOC entity-property usage are kind of interesting:
- http://data.itatti.harvard.edu/resource/Start

Notes from Farris:

What data would be on a URI?
Samples
Predefine what would go online
Two Main Goals:
- Model URIs
- External source
Yale Center for British Art
- Have explanations of their properties and classes
How does Whitney data relate to other data?
Add data to ones that Joshua started
- Object-level
- Constituents
For next week:
- A model URI
- Concept for visualizations
- Present some samples

Yale Center for British Art

The Yale Center for British Art uses a tool called Pubby to represent their linked data in a human-readable format:
- http://britishart.yale.edu/collections/using-collections/technology/linked-open-data
- http://collection.britishart.yale.edu/id/page/object/499
Pubby:
- http://wifo5-03.informatik.uni-mannheim.de/pubby/

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

April 5, 2017

by amanda

Read Time:20 Second

External URIs

I’ve started collecting URIs for purchase-related constituents:

Props Added:

VIAF ID – https://www.wikidata.org/wiki/Property:P214 ✔
LCNAF ID – https://www.wikidata.org/wiki/Property:P244 ✔
ULAN ID – https://www.wikidata.org/wiki/Property:P245 ✔
British Museum person-institution – https://www.wikidata.org/wiki/Property:P1711 ✔
Smithsonian American Art Museum: person/institution thesaurus id – https://www.wikidata.org/wiki/Property:P1795 ✔

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

April 6, 2017

by amanda

Read Time:9 Second

Notes from Cristina

URI best practices – look at Semantic Web for the Working Ontologist
- Very clear-cut manual
LOD View
Matt in Cleveland, Cristina can only meet Thursday 4/13

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

April 9, 2017

by amanda

Read Time:35 Second

Dataset Work

I did some at-home work on Sunday, using the OpenRefine RDF plugin to create an RDF/XML dataset.
I used the RDF plugin with the bloody-byte.net CIDOC-core schema (http://bloody-byte.net/rdf/cidoc-crm/core_5.0.1#) to map the purchase_constituents csv file generated from TMS to CIDOC.
I had some difficulty with the RDF plugin, and had to do some uninstalling/reinstalling of OpenRefine to get it to work.
Some Resources:

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

April 11, 2017

by amanda

Read Time:1 Minute, 22 Second

Dataset Work

I’m continuing in my work from Sunday and attempting to create RDF/XML using Joshua’s finalObject.csv file from last semester
His script for generating JSON-LD (2015-16_JDull/Code/objectsCode/objectJSONLD.py) could be helpful for format conversion, though http://rdf-translator.appspot.com/ could also work, depending on how accurate it is.

Tableau

I started a Tableau visualization of provenance sources:

Notes From Farris

Hoping to append current collection pages with linked data.
Create sample records for each entity type (Constituent, Object, events, etc). Yale Center for British Art as model.
Confirm Skype date w/Cristina
Do not resolve against base URI (IDs rather than set URLs), as these URLs are not known.
Tableau – great, but try to include purchase dates of objects, plus biographical details about constituents, possibly linking to Wikidata, etc. The Wikidata coordinate location property (https://www.wikidata.org/wiki/Property:P625) if nothing else
WorldCat identities – bibliographic source?

Important GREL Transform Expressions for OpenRefine:

grel: value.replace(‘,’, ”)”

grel: “http://www.wikidata.org/entity/”+cell.recon.match.id”

grel: cell.recon.match.id

grel: “http://www.viaf.org/viaf/”+cell.recon.match.id”

grel: “https://tools.wmflabs.org/openrefine-wikidata/en/fetch_values?item=”+cell.recon.match.id+”&prop=P245”

grel: value.parseJson().values.replace(‘[‘,”).replace(‘]’,”).replace(‘\”‘,”)

grel: “http://vocab.getty.edu/page/ulan/”+value

grel: “http://collection.britishmuseum.org/id/person-institution/”+value

grel: “http://edan.si.edu/saam/id/person-institution/”+value

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

April 17, 2017

by amanda

Read Time:13 Second

Defining the Whitney’s URIs

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

April 18, 2017

by amanda

Read Time:41 Second

Defining the Whitney’s URIs Cont’d

TMS Screenshots

Fields used by the Whitney in TMS:

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

April 19, 2017

by amanda

Read Time:8 Second

Whitney URI Guide

Maybe define each individual property used?
- http://mwdl.org/docs/MWDL_DC_Profile_Version_2.0.pdf
Also look at Whitney Content Standard Element Sets

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

April 22, 2017

by amanda

Read Time:4 Second

Working at Home

Trying to use regular expressions to separate provenance text columns in OpenRefine:

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

April 24, 2017

by amanda

Read Time:18 Second

Working at Home Continued

I have been working on my guidelines for Whitney URI creation over the past few days, and hope to have it done by tomorrow.
The CIDOC CRM website has been offline since yesterday afternoon, which is troubling. I think the general inconsistency of their hosting further validates my choice of Erlangen as a substitute schema.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

April 27, 2017

by amanda

Read Time:20 Second

Notes from Farris

Plan to have PDF version ready for IT by Tuesday
Instead of collection.whitney.org, resolve to opendata.whitney.org
Define what RDF, RDFS, Schema, etc, are
Conceptual map as large PDF
Add purchase dates to Tableau viz, create a timeline
Add a section explaining OpenRefine
Add RDF versions of resources if not too complicated
Add some images/linked data content to Tableau viz.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

April 28, 2017

by amanda

Read Time:26 Second

Working on URI Proposal

I’ve been continuing to work on my Whitney Linked Data proposal doc
I discovered that CIDOC’s namespace IRIs finally resolve correctly, meaning I no longer have to use Erlangen as a vocabulary for my proposal.

Meeting with Cristina and Farris

I met with Farris and Prof. Pattuelli to review the materials I’m presenting to the Whitney Research Resources and IT departments.
Suggestions included:
- Adding definitions for CIDOC classes in addition to properties.
- Adding sample RDF records.
- Adding guidelines for using OpenRefine

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

May 1, 2017

by amanda

Read Time:12 Second

Working at Home

I’ve been continuing to work on my Whitney URI guide since Friday.
I added more details on using OpenRefine for name reconciliation/RDF generation.
I SPARQL-queried Wikidata to find all of its art-related properties:

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

May 2, 2017

by amanda

Read Time:16 Second

Presenting to Research Resources Department

I presented the guidebook I created to the Whitney’s Research Resources Department.
My slides are here: https://docs.google.com/presentation/d/1AAmDZXERmDbLjVWJ0bjeC4Iu4ebmxGt4cZ7kvLjPJtE/edit?usp=sharing
Next week, I will be presenting to the representatives from the Whitney’s IT department, along with a representative from the Curatorial department.

About Post Author

amanda

mollie.echeverria@gmail.com

Happy

0 %

Sad

0 %

Excited

0 %

Sleepy

0 %

Angry

0 %

Surprise

0 %

mollie_ech

Possible Directions for the Project

Possible Projects

Exhibition History Focus

Connecting Objects/Enriching Object Data

Exploration of Past Project Work and Possible Entities

GraphDB

Ontologies

Person/Organization

Next Steps

Triple Stores

First Steps?

OpenRefine

Enriching Provenance

Discussion w/Profs Pattuelli and Miller

How to handle database (whether or not to migrate to triple store)

More on the database/triple store issue

Ontologies

Carnegie Museum

Modeling

Visualizing Provenance

Project Overview for Rest of Semester and Spring

Issues With CIDOC Mapping – CIDOC’s Lack of Namespace URIs

Linked Data in a Relational Database

Experimenting with Databases

Berenson Drawings of the Florentine Painters Project

Drawings of the Florentine Painters Project and Whitney Model

Rough Outline of Work for Spring

Data Preparation: Now – Jan 21st, 2017

Data Processing: Jan 21st – Feb 4th, 2017

Data Modeling Refinement/Extension: Feb 4th – 11th, 2017

Defining the Conceptual Model in 3M: Feb 11th – March 4th, 2017

Mapping Whitney TMS Data to Classes and Properties: March 4th – 25th, 2017

Data Enrichment: March 25th – April 15th, 2017

Data Publishing: April 15th – 29th, 2017

Visualization(s) and Incorporating Images: April 29th – May 13th, 2017

Setting Up Server Access

VIAF

Querying With Python

Incorporating External URIs Into the Whitney’s Linked Data

Querying the Smithsonian American Art Museum’s Data

Meeting With Alison

Wikidata and SNAC

Refining Data

TMS Update Issues

To Work On