featureenvy

The blog of a thinkerer.
By @featureenvy

All rights reserved.

How Does An ORM Work?

One of the things all of us use regularly and (most?) of us have no idea how it actually works is an Object Relation Mapper or ORM. I think it is crazy that a piece that we rely on and couldn't work without anymore is so foreign to most of us. So I decided to change that. And end of this wordy piece I want you to know how a ORM works. Of course you won't know all the details, like how some ORMs seem to be able to do more with the database that you even knew was possible. But some basic should be cleared up by then.

On DataMapper

For this, I decided to deep dive into DataMapper. DataMapper is an alternative to ActiveRecord, but with less magic, which is great if you are trying to read something. A basic DM model looks something like this:
class Duck
  include DataMapper::Resource

  property :id, Serial
  property :name, String
  property :hungry, Boolean
end
DataMapper::Resource is the base include that enables all the magic DataMapper functionality. Like in ActiveRecord, where you inherit from ActiveRecord::Base. The property method is defined in DataMapper::Model::Property. This method does a lot of stuff! In fact, it spans from line 27 to line 99! First up, it determines the class based on the type we provided in the class definition:
klass = DataMapper::Property.determine_class(type)
So if we define property :name, String, it will determine that we want a String! Mind bending, isn't it? Afterwards, it will instantiate the class (in our case DataMapper::Property::String). Then we get the now defined properties in the given repository (basically a "different" DB connection) and add our property to it. Easy. Now we have some filler code until the end, with the exception of these two babies:
create_reader_for(property)
create_writer_for(property)
This (shockingly!) creates the readers and writers for the property we defined. So what we know by now is that DataMapper keeps a list of all properties defined for a given class, so it can query, search, create etc. based on these values.

Querying DataMapper

So we can create a model, but that is really now all that a ORM does. With a ORM, we can of course query for data, and that is what makes it really interesting! So let's get to it! Querying is done in DataMapper::Model. Lets take an example query:
Duck.all
We find all on lines 341 - 348.
def all(query = Undefined)
  if query.equal?(Undefined) || (query.kind_of?(Hash) && query.empty?)
    # TODO: after adding Enumerable methods to Model, try to return self here
    new_collection(self.query.dup)
  else
    new_collection(scoped_query(query))
  end
end
So we are generating a new collection with the given query. Cool. Model#new_collection makes a new collection (line 749 for the lazy ones). D'uh! So what happens when we generate a new collection? This:
    def initialize(query, resources = nil)
      raise "#{self.class}#new with a block is deprecated" if block_given?

      @query        = query
      @identity_map = IdentityMap.new
      @removed      = Set.new

      super()

      # TODO: change LazyArray to not use a load proc at all
      remove_instance_variable(:@load_with_proc)

      set(resources) if resources
    end
(On Github) You might now realize: Nothing happens! No database connection is made, nothing! In fact, DataMapper doesn't actually perform the database call until really, really necessary! It just returns a "placeholder" collection if you want that has the information needed to actually perform the call. One way to trigger that call would be #first:
    def first(*args)
      first_arg = args.first
      last_arg  = args.last

      limit_specified = first_arg.kind_of?(Integer)
      with_query      = (last_arg.kind_of?(Hash) && !last_arg.empty?) || last_arg.kind_of?(Query)

      limit = limit_specified ? first_arg : 1
      query = with_query      ? last_arg  : {}

      query = self.query.slice(0, limit).update(query)

      if limit_specified
        all(query)
      else
        query.repository.read(query).first
      end
    end
Quite a bit of code here, luckily we can ignore most of it. For example the first few lines are just applicable if we actually pass in some options (which we won't do for now). The first really interesting line is line 16. Here we actually call query.repository.read(query).first. Here we call repository#read. So lets see this in action.
      def read(query)
        fields = query.fields
        types  = fields.map { |property| property.dump_class }

        statement, bind_values = select_statement(query)

        records = []

        with_connection do |connection|
          command = connection.create_command(statement)
          command.set_types(types)

          # Handle different splat semantics for nil on 1.8 and 1.9
          reader = if bind_values
            command.execute_reader(*bind_values)
          else
            command.execute_reader
          end

          begin
            while reader.next!
              records << Hash[ fields.zip(reader.values) ]
            end
          ensure
            reader.close
          end
        end

        records
      end
This is a lot simpler than it looks. It takes the query, constructs a select query and then executes it. Simple as that! In the end, it maps the values it got back from the Database into "real" objects and returns the given records to us.

So That's It

Lets review all that. So an ORM needs a structure to tell it how the database tables should be mapped to the object. DataMapper does this with the DataMapper::Model#property method, where ActiveRecord reads this data from a schema file. In the end, both automatically generate setters and getters for the properties. Then we have a layer that stores the queries, the results, etc. and constructs the right, database agnostic data so the query can be fulfilled. And the third layer, where I really glossed over very shortly, is the database adapter (Do you want to know some more about the database adapter themselves? Then contact me on Twitter or in the comments and tell me about it!). The database adapter knows how the query should be run in the database and knows how to transform the results back into the objects. And that's the way we query a database! If you liked this post, you should follow me on Twitter and signup for the RSS feed!