Tuesday, May 14, 2019

Audit logging with Kinesis Firehose

Audit logs

We had a requirement to maintain an audit log of user actions that management would be able to report on one day. Our systems currently forward logs to Datadog, but we have fairly short retention periods. Extending the retention period for a limited number of logs would be too costly at this stage.

The other option that seemed to make sense was storing these logs in S3 - they could be easily search at some stage later using AWS Athena. I'd heard of Kinesis Firehose before, so this seemed to be an ideal case to try it out.

Kinesis Firehose allows you to put events into the stream and configure an output destination, of which S3 is one. It takes care of bundling events into files, transforming and outputting the data at very little cost, all without standing up any servers.

The code

To setup Kinesis Firehose through cloudformation (Serverless specifically) you can use the following configuration and code.

Creating the infrastructure is really fast, so if you create a new stack for each branch of code, this will work well with your pattern. If you put your branch into the s3 prefix, it'll make life easier to determine which logs belong to which branch, and you can run Athena based on a S3 prefix, not just the whole bucket

Stream Configuration

Create an IAM role for the stream that allows it to write to S3 in a given path, configure the stream

Push events in our app

Now that we have a stream, it's simple to create a module for sending events:

Thursday, March 30, 2017

JavaScript filter thisArg

I was attempting to write an amazing filter that would just return an array of objects where a property was true or false (and I wanted to be able to choose a value at runtime). This seemed reasonably straight forward:

function filterBySomeProp(object) {
  return object.someBoolProp === this;
}
bigArrayOfObjects.filter(filterBySomeProp, true);

Unfortunately it didn't give the expected result - turns out when passing the primitive bool as the thisArg, it gets converted to a Boolean object, and strict equality checks fail (e.g '===') as the value and type is checked. This is fixed by wrapping the primitive value when doing the check:

function filterBySomeProp(object) {
  return Boolean(object.someBoolProp) === this;

}

Happy filtering!

Friday, February 3, 2017

Learning in the open - mobprogramming

One of the best things about mob programming (#mobprogramming on twitter) is the way that the team shares learning.

Our team recently worked on a tool to remove unused AWS stacks, dedicating quite a few lunchtimes to the cause. We decided it would be a great opportunity to try some mob programming, as it was a project that we were all excited about.

In the process we

  • Learnt some emacs
  • Learnt about using the Serverless framework (AWS lambda under the hood) for the first time
  • Worked out how we were going to test our application
  • Familiarised ourselves with recursion and promises in javascript
  • Came up against deployment and runtime problems

While each of these could have been done solo, mob programming gave us the opportunity to:

  • Talk the problems through, coming up with better design
  • Get comfortable asking for explaination when we're unclear - showing that we don't know it all
  • Bring something new to the table that other people don't know - even if we've just found it ourselves
  • Teach others
  • Share the joy of seeing something working that we've worked on so closely together


One of the biggest benefits is individuals showing they don't know everything. Often as software developers we suffer from imposter syndrome, but just exposing that we're all learning can encourage a better culture where people are happy being vulnerable - where they feel they can ask questions without judgement.

Have you tried mob programming? What do you think are the benefits? Drawbacks?

Tuesday, January 31, 2017

Spinning down your cloud costs

One of the best things about the cloud is the ability to spin up new infrastructure for your dev environment. You can have a completely isolated environment for a new branch that you're working on, allowing you to test your changes in isolation.

This is fantastic as you can speed ahead on changes unencumbered - the only problem is that you'll leave a trail of cash sitting in your cloud providers pockets, which often doesn't work out so well for your business.

To attack our cost blowouts, we started a side project in our lunchtime to figure out how we can stop the hurt, and came up with Batman built using Serverless. Most of our infrastructure is managed in a single place (cloudformation) with a set of tags associated with each stack including the slice (or branch name), and the cloudformation stack keeps a created date & last update date. Utilizing this information Batman runs every night to cleanup stacks that are on non-master branches where the stack hasn't been updated for 7 days. This is usually enough time for us to have tested the branch and integrated it into the master stack.

We've also recently added a webhook trigger that deletes a stack when the branch is deleted in git which saves us up to 7 days of running costs.

We've stopped over 300 stacks in a month, and at a guesstimate of $1 per day running costs for a stack, the dollars quickly add up.

There are many ways to save costs, and we've found that for our journey into AWS, Batman has come about at the right time. Not only has it been interesting to explore serverless, our team got to do a lot of the work with some mob programming which was really fun (more to come).

Monday, February 15, 2016

Postgres search - overlapping arrays

Postgres Overlapping Arrays

We've got some spare food hanging around that we need to get rid of, but have to figure out which of our animals will eat it. Luckily we've recorded what each of them eat in our database! Given the three animals

{ 'species': 'cat', 'name': 'frank', 'eats': ['fish', 'biscuits', 'pudding'] }
{ 'species': 'dog', 'name': 'frankie', 'eats': ['fish', 'vegemite', 'cat'] }
{ 'species': 'snake', 'name': 'francine', 'eats': ['biscuits'] }

We want to find who will take the fish and pudding that we have sitting around. We'll get them out of their cage and bring them to the one spot to feed them. How can we figure this out from our data?

If the document was modelled to it's 3rd normal form in sql we could do some joins and figure it out without too much drama. To do this in Postgres with our jsonb storage a simple way to achieve the same outcome is to store the values we want to search on - eats in our case - is to store those values in their own column as an array (in addition to the jsonb column) and then use the Postgres array comparisons to do a search where the eats column contains any of ['fish', 'pudding']. What does this look like using Massive?

db.run(`select * from animals
  where eats && $1`,
  ['fish', 'pudding'],
  (err, results) => {
    // do something interesting with the results
})

Note that you'll need to add the extra column manually and populate it when saving a document:

db.animals.insert({body: jsonDocument, eats: jsonDocument.eats}, (err, results) => {})

Postgres - partial text search

Postgres Partial Text Search

Given the three documents below, we want to find the animals with name like 'frank'

{ 'species': 'cat', 'name': 'frank' }
{ 'species': 'dog', 'name': 'frankie' }
{ 'species': 'snake', 'name': 'francine' }

If this were sql we would search name like %frank% we'd expect to get the first two results. Postgres & Massive gives us a way to search for equality in a document so:

db.animals.findDoc({name: 'frank'}, (err, res) => { // do things with results })

Would only return the first result. (Note that Massive creates an index for your jsonb document if you let it create the table).

To search for partial results we need to add a Trigram Index - this breaks up text into three character chunks and gives a weighting to the search term match, rather than giving us an exact match or nothing. For the example above, we would break out the name property into another column (Massive creates a 'search' column which you could populate, or just make your own) - at this point we lose some of the ability to use Massive findDoc functions, but can still use the where function.

Creating the record would now look like

db.saveDoc("animals", {body: jsonDocument, name: jsonDocument.name}, function(err,res){});

And to query:

db.animals.where("name like $1", ['%frank%''], (err, result) => { // do something with the result });

Postgres - a document database?

Storing large json documents in SQL servers is painful. You first need to map the object out to a relational structure, then deal with putting it back together when you're querying. In addition you probably need to use some ORM to store your objects. All of which makes me sad.

With Postgres, there is a great feature that lets you store your json object into a column, and do queries on the json natively in the database!

I'm using Massive.js for working with Postgres in node, which has first class support for jsonb storage and makes the query syntax for working with jsonb documents a bit nicer. It also lets you write your own SQL when you need to do some more custom queries.

If you don't want any other columns in your table (it'll give you id and created date for free), you can use the saveDoc function (more on the doc storage syntax here):


db.saveDoc("my_documents", jsonDocument, function(err,res) {
    //the table my_documents was created on the fly with id and created date
    //res is the new document with an ID created for you
});

If you want to store some more data in your table then you can:

  • Manually create the table yourself prior to using Massive
  • Let Massive create the initial table for you, then add columns as needed (Massive will also create the primary key and auto-incrementing function which is handy)

We'll look at some querying strategies in the next post (there is documentation in the json section of the github repo which is a great place to start)