Monday, February 15, 2016

Postgres search - overlapping arrays

Postgres Overlapping Arrays

We've got some spare food hanging around that we need to get rid of, but have to figure out which of our animals will eat it. Luckily we've recorded what each of them eat in our database! Given the three animals

{ 'species': 'cat', 'name': 'frank', 'eats': ['fish', 'biscuits', 'pudding'] }
{ 'species': 'dog', 'name': 'frankie', 'eats': ['fish', 'vegemite', 'cat'] }
{ 'species': 'snake', 'name': 'francine', 'eats': ['biscuits'] }

We want to find who will take the fish and pudding that we have sitting around. We'll get them out of their cage and bring them to the one spot to feed them. How can we figure this out from our data?

If the document was modelled to it's 3rd normal form in sql we could do some joins and figure it out without too much drama. To do this in Postgres with our jsonb storage a simple way to achieve the same outcome is to store the values we want to search on - eats in our case - is to store those values in their own column as an array (in addition to the jsonb column) and then use the Postgres array comparisons to do a search where the eats column contains any of ['fish', 'pudding']. What does this look like using Massive?`select * from animals
  where eats && $1`,
  ['fish', 'pudding'],
  (err, results) => {
    // do something interesting with the results

Note that you'll need to add the extra column manually and populate it when saving a document:

db.animals.insert({body: jsonDocument, eats: jsonDocument.eats}, (err, results) => {})

Postgres - partial text search

Postgres Partial Text Search

Given the three documents below, we want to find the animals with name like 'frank'

{ 'species': 'cat', 'name': 'frank' }
{ 'species': 'dog', 'name': 'frankie' }
{ 'species': 'snake', 'name': 'francine' }

If this were sql we would search name like %frank% we'd expect to get the first two results. Postgres & Massive gives us a way to search for equality in a document so:

db.animals.findDoc({name: 'frank'}, (err, res) => { // do things with results })

Would only return the first result. (Note that Massive creates an index for your jsonb document if you let it create the table).

To search for partial results we need to add a Trigram Index - this breaks up text into three character chunks and gives a weighting to the search term match, rather than giving us an exact match or nothing. For the example above, we would break out the name property into another column (Massive creates a 'search' column which you could populate, or just make your own) - at this point we lose some of the ability to use Massive findDoc functions, but can still use the where function.

Creating the record would now look like

db.saveDoc("animals", {body: jsonDocument, name:}, function(err,res){});

And to query:

db.animals.where("name like $1", ['%frank%''], (err, result) => { // do something with the result });

Postgres - a document database?

Storing large json documents in SQL servers is painful. You first need to map the object out to a relational structure, then deal with putting it back together when you're querying. In addition you probably need to use some ORM to store your objects. All of which makes me sad.

With Postgres, there is a great feature that lets you store your json object into a column, and do queries on the json natively in the database!

I'm using Massive.js for working with Postgres in node, which has first class support for jsonb storage and makes the query syntax for working with jsonb documents a bit nicer. It also lets you write your own SQL when you need to do some more custom queries.

If you don't want any other columns in your table (it'll give you id and created date for free), you can use the saveDoc function (more on the doc storage syntax here):

db.saveDoc("my_documents", jsonDocument, function(err,res) {
    //the table my_documents was created on the fly with id and created date
    //res is the new document with an ID created for you

If you want to store some more data in your table then you can:

  • Manually create the table yourself prior to using Massive
  • Let Massive create the initial table for you, then add columns as needed (Massive will also create the primary key and auto-incrementing function which is handy)

We'll look at some querying strategies in the next post (there is documentation in the json section of the github repo which is a great place to start)

Tuesday, August 18, 2015

Testing SQL Server Connections

Testing SQL Server Connections
I found this gem today trying to confirm if a windows server had access to a SQL instance (thinking I'd have to install something and do 10 restarts) - turns out you can create a .udl file which will bring up simple window to test your database connection!

Friday, June 22, 2012

One of the issues of infrequent releases.....

It's been a month since our last deployment, and we've got two features that's have been sitting in code for longer than that, and one that's been recently developed. One feature is 'hidden', the other isn't functional yet (a couple of external dependencies).

There are a couple of problems here:

  • Not getting feedback on features that are ready to be released
  • Keeping the information in your brain so that you can fix any bugs that arise when the feature finally gets released
  • Trying to remember all the little bits that need to be completed to make the feature ready to release
A lot of this has to do with the flow of development, and perceived issues with releasing small features often. But the one that has kept on creeping up on me in the last couple of weeks is the realisation I want to keep adding new features, little ones that I think will make a difference, but aren't really necessary or proven to be required. 

This mentality comes from a "we're not going to be able to push anything new for another month", so try to pack everything in. It means that the code isn't as lean as it should be. Instead of waiting for feedback, reacting and pushing a new release, we just assume what people are after. 

Have you found this to be an issue when you're doing infrequent releases?

Wednesday, February 22, 2012

Raven DB Multi Map Index

I'm going to be looking at getting multiple document types returned from raven querying some properties that both objects have.

We're going to be using Animals and Vehicles, and we want to be able to figure out which animals and vehicles operate on land. These documents don't share inheritance, so we can't just search the base properties - what we need is a mapping that collects all the common properties of these documents.

The objects are:

We need to start by mapping the common fields to a new object for use in our LINQ query.
We also create a mapping for each object type that we want to index, selecting the common fields into an anonymous type.

You may have heard a lot of talk about Map/Reduce in the document database world, but for this particular query we're just interested in the Map part - we don't want to reduce the results to a common object but rather return the original objects.

Now that we've defined our Map we need to tell Raven about it which is really easy. Just after calling DocumentStore.Initialize() you call IndexCreation.CreateIndexes(typeof(AnimalsAndVehicles).Assembly, DocumentStore) which takes care of creating the index, and now it's ready to use!

To use the MultiMap: The key part of this is the As<dynamic>() - this will let us return any object from the query.

And the magic: dynamicMapIndexResults

A couple of helpful links in putting this together and thinking about how the mapping works were the Raven Documentation and Daniel Lang's blog about Multi Map Indexes

Wednesday, February 8, 2012

Basic Raven DB Indexing and Querying with Lucene

We've got up and running with Raven DB for our new project, which has been a delight to use from a code perspective, but one of the big gaps for me is figuring out how to do SQL like queries. I'm not too bad at putting a SQL query together as something I do semi-regularly as part of my job, so I thought it shouldn't be too much of a stretch to start pulling some statistics.

I was wrong.

If you're using the .NET client, this is not something you particularly need to worry about as Raven will dynamically create indexes for you based on your LINQ query, which is awesome. My particular scenario is using the Raven web admin pages to do a query that I'm not particularly interested in coding - I just want to see some numbers quickly.

I'll talk you through the steps that I've gone through to get a simple query working, hopefully its of some benefit to you. To start with you should know that RavenDB search is powered by Lucene - we'll be looking at that in a bit. I'm going to start with a data set of animals :


What I'd like to see in the browser is a list of all the living land animals. What this means is we want to do a SQL equivalent of where on 2 of the fields: Lives AND Status.


To make these two columns queryable we need to make an index - unlike SQL we can't just query whatever we want whenever we want - we've got to tell Raven which fields we're going to do a query on. To do this jump over to the 'Indexes' section and 'Create an Index'. What we're going to do is create a document that maps to the columns we need to query, so give your index a name and jump into the 'Maps'.


A breakdown of the parts:

  • from animals: animals is just the alias for the document we're going to refer to in the select
  • in docs.SimpleIndexExample: this provides the source for the map - docs is generally all documents in the database, and SimpleIndexExample is the document type
  • animals.Lives and animals.Status are the two columns we want to query on

Getting back to the point, I'm going to put together a query to get a list of all living land animals. Now that the index is created, we can click on it and use it to do a query (it'll look different depending on which version of Raven you've got running, but hopefully you'll be able to find the 'Query' section)

Creating the query is just a matter of telling Raven which field you want to query on, and value to restrict it to, so it should look something like this (by the way, you can get some auto-completion on the map field by pressing ctrl + space):


If you haven't picked it up, there's a problem with our result set - we're getting a Shark and TRex - this is not good! The problem is Lucene is looking for anything that matches the query, effectively doing an 'OR' on our statement. What we need to do is tell Lucene to require that a certain value is present - this is just putting a '+' symbol at the start of each required field:


For heaps more info about using Lucene to query have a look at the Lucene Query Parser Syntax Page

Please let me know if you have any comments, ways I can think about this better as this is still a very new area for me, or if you've found it helpful.