Monday, February 15, 2016

Postgres search - overlapping arrays

Postgres Overlapping Arrays

We've got some spare food hanging around that we need to get rid of, but have to figure out which of our animals will eat it. Luckily we've recorded what each of them eat in our database! Given the three animals

{ 'species': 'cat', 'name': 'frank', 'eats': ['fish', 'biscuits', 'pudding'] }
{ 'species': 'dog', 'name': 'frankie', 'eats': ['fish', 'vegemite', 'cat'] }
{ 'species': 'snake', 'name': 'francine', 'eats': ['biscuits'] }

We want to find who will take the fish and pudding that we have sitting around. We'll get them out of their cage and bring them to the one spot to feed them. How can we figure this out from our data?

If the document was modelled to it's 3rd normal form in sql we could do some joins and figure it out without too much drama. To do this in Postgres with our jsonb storage a simple way to achieve the same outcome is to store the values we want to search on - eats in our case - is to store those values in their own column as an array (in addition to the jsonb column) and then use the Postgres array comparisons to do a search where the eats column contains any of ['fish', 'pudding']. What does this look like using Massive?

db.run(`select * from animals
  where eats && $1`,
  ['fish', 'pudding'],
  (err, results) => {
    // do something interesting with the results
})

Note that you'll need to add the extra column manually and populate it when saving a document:

db.animals.insert({body: jsonDocument, eats: jsonDocument.eats}, (err, results) => {})

Postgres - partial text search

Postgres Partial Text Search

Given the three documents below, we want to find the animals with name like 'frank'

{ 'species': 'cat', 'name': 'frank' }
{ 'species': 'dog', 'name': 'frankie' }
{ 'species': 'snake', 'name': 'francine' }

If this were sql we would search name like %frank% we'd expect to get the first two results. Postgres & Massive gives us a way to search for equality in a document so:

db.animals.findDoc({name: 'frank'}, (err, res) => { // do things with results })

Would only return the first result. (Note that Massive creates an index for your jsonb document if you let it create the table).

To search for partial results we need to add a Trigram Index - this breaks up text into three character chunks and gives a weighting to the search term match, rather than giving us an exact match or nothing. For the example above, we would break out the name property into another column (Massive creates a 'search' column which you could populate, or just make your own) - at this point we lose some of the ability to use Massive findDoc functions, but can still use the where function.

Creating the record would now look like

db.saveDoc("animals", {body: jsonDocument, name: jsonDocument.name}, function(err,res){});

And to query:

db.animals.where("name like $1", ['%frank%''], (err, result) => { // do something with the result });

Postgres - a document database?

Storing large json documents in SQL servers is painful. You first need to map the object out to a relational structure, then deal with putting it back together when you're querying. In addition you probably need to use some ORM to store your objects. All of which makes me sad.

With Postgres, there is a great feature that lets you store your json object into a column, and do queries on the json natively in the database!

I'm using Massive.js for working with Postgres in node, which has first class support for jsonb storage and makes the query syntax for working with jsonb documents a bit nicer. It also lets you write your own SQL when you need to do some more custom queries.

If you don't want any other columns in your table (it'll give you id and created date for free), you can use the saveDoc function (more on the doc storage syntax here):


db.saveDoc("my_documents", jsonDocument, function(err,res) {
    //the table my_documents was created on the fly with id and created date
    //res is the new document with an ID created for you
});

If you want to store some more data in your table then you can:

  • Manually create the table yourself prior to using Massive
  • Let Massive create the initial table for you, then add columns as needed (Massive will also create the primary key and auto-incrementing function which is handy)

We'll look at some querying strategies in the next post (there is documentation in the json section of the github repo which is a great place to start)