finnikk

The blog of a thinkerer.
By @finnikk

All rights reserved.

3 Things You Could Know If You Read "Eloquent Ruby" by Russ Olsen: A Review

Starting with Ruby isn't easy. Especially if you are like me. I have, like many before me, a Java background. So I knew statically typed languages, I knew what it meant, I knew what its strength were. And, like the monkey that I was, I thought statically typed languages were the best! Why? IDEs have a really easy time to refactor a lot of code. Everything was clear. Well, you know, the standard arguments you probably also here when we talk about statically typed languages. But this post has nothing to do with statically typed languages, with me having seen the light at the of the tunnel and all that. And since you are reading this, I guess you did the same! But, fun fact, here, for free, no sign-up needed: Ruby can do a lot more. A lot! And it has it warts, and it has its way of doing things. This isn't easy to understand when you are knew, or even when you are experienced. So here are 3 tips on what you could learn from reading Eloquent Ruby by Russ Olsen:

Strings In Ruby Are Awesome! Literally!

Ever had to write a string in Java? I feel sorry for you, because I have written way too many back in the days. Remember "This is a \"String\"" or even better, a Regex? (Hint: "\\..*", and yes, that's basically a smiley. That's also my theory that smilies are a result of Java programmers using Regex). The first string problem is easy to solve in Ruby: Just use a single quote! But even better, if you mix single and double quotes in a sentence ("And here shall "REST" Java. It's done.") we can solve this with a string literal: %q{And here shall "REST" Java. It's done.}. See? No problem. In fact, we could even replace { with '$', since we never use a '$' in our sentence! Or a lot of other special characters for that matter. The Regex (or Regexp?) we could write as /\..*/. No double escaping needed! And yes, it still resembles a smiley, I realize that. It's still a Regex after all! There is a lot more that can be done with string literals, so you should check that out in the book!

Ruby Has Hooks For A Lot Of Things

Ever written a module? Imagine that you are that module. This is carry stuff! Just think about it: You have to blindly stumble around, never knowing which class carries you around! Luckily, Ruby offers us hooks that we can use to alleviate these problems. Each module has a self.included method that you can use. It is called every time a class, well, includes this module. Even better, it even tells you which class it was! This way you can build up a list of all classes that have this module included. Doesn't sound very handy? But maybe, when the module is included, you also want to extend that class. Maybe you have a method that should not be on the class, but on the object! With self.included and other hooks, everything's possible!

How To Build Your Own DSL

DSLs. How much we all love these little things, and how much we use them each and every day. Thank you, dynamic typing! Probably you have at least seen some RSpec code. Yes, all that let, describe, and other syntactic sugar is just a DSL, implemented in Ruby. Do you want to know how to do this? Then go read the book! It has a few caveats that I can't all list here, because this is just a blog, not a book! Even better, he also mentions how to build an external DSL. An external DSL is not written in Ruby, and has to be parsed "by hand" with Ruby (without the Ruby interpreter helping). Which does have its pros and cons, as you might imagine.

Conclusion

All in all, a very good book that opened my eyes (and still does, on my second reading of it!). One of the really nice features about the book is its structure. Russ gives us first an introduction why we want to do this, then tells us how. So far so good. But then he continues: He also tells us how you can get in trouble with it. To top it off, he shows some examples of the described concept in the wild. If you are a Rubyist and you haven't yet read Eloquent Ruby, I highly suggest you do. Because Ruby can do a lot more than you might image! If you liked this post, you should follow me on Twitter and sign up for the RSS feed!

What Others Know About Compression That You Don't

Ah, the compression formats. Always a mystery. Always just something we will use, but something we never understand. For one thing, this makes sense, considering Wikipedia alone lists 15 different compression formats. There are the well-known bz2, gz and LZMA algorithms. And the less well-known rz, ?Q? and ??_. Now that last one makes me happy this isn't a podcast... Though for this post we will concentrate on Deflate. Deflate is not only used in gzip and zip, but is also a fan favorite to compress web assets when sent to the browser.

The Basic Building Blocks

Underneath Deflate uses LZ77 combined with a Huffman encoding. LZ77 is used to detect duplicate strings and insert a so-called back reference (more on this later). Huffman Encoding is similar to ASCII, but instead of always using 8 bits, it stores the most used characters in 3 bits, and the least used ones, with, well, more bits. More on this also later.

Come On, Deflate Already!

First, Deflate chops up all parts into blocks to make processing easier. Then it decides in which mode it should encode that block:
  1. Store raw stream
  2. Store with a static Huffman encoding
  3. Store with a dynamic Huffman Encoding
Now, it still doesn't deflate. But lets see. First mode is raw mode. In raw mode, Deflate decides that the data is already compressed and it can't do a better job. So it just inserts its header to say "I can't do anything with that, this is already compressed, dummy!". That's why, the next time a coworker shouts at you why you didn't ZIP these files, you can shout back: "D'oh, mode 1, man!". Second mode is static Huffman mode. This just means that Deflate decides that the data is standard enough that the standard Huffman Encoding table should be used. For example useful for English text. This way Deflate doesn't have to add a table as overhead to the compressed content. Third is dynamic Huffman mode. With dynamic Huffman mode, the content gets compressed with a Huffman Encoding, but the encoding table is generated based on the input. So for example, in Finish, the shortest code would probably be "ö", or something like that. In @zedshaw's recent tweets it could be "C", "F", "K" and "U", who knows! But that leaves us with two question: Huffman Encoding, wut? And LZ77?

Huffman Encoding

Huffman Encoding is something that a guy called Huffman developed while getting a Ph.D from MTI MIT. Nothing surprising until now! The paper on it is actually from 1952. Damn it! That's double my age! That paper should feel old! What it basically says: Tally up all individual characters. The most used one gets the shortest code (at 3 bits). Then continue down until you arrived at the end. Actually, Wikipedia does a better job at that than I ever could, so here's a picture: [caption id="attachment_241" align="alignnone" width="600"]Huffman Coding Tree Huffman Coding Tree[/caption] (Source: http://en.wikipedia.org/wiki/File:Huffman_tree_2.svg) So that one sounded scary, turned out to be really easy!

So What About That LZ77

The next part in this tour around compression algorithms leads us right back into history. Can't we get something new going in the compression world? LZ77 was introduced in a paper in, well, 1977 by Lempel and Ziv. LZ77 is a so-called sliding window compressor. At least they came up with impressive names back then. Why sliding window? It checks the last 2, 4 or 32 kB for repeating strings. Simple as that. Should it find one such repeating string, instead of adding that string to the output stream, it puts a back marker in that basically says: If you want to finish reading that sentence, go back to page 32, line 17, and read from character 17 to character 34. That lazy bastard!

Now I See How They Tie Together

Well I hope you do! Else I failed and you should shame in the comments. :) Or of course I can try again: First, Deflate splits up the document into blocks to make working with it easier. Then (lets assume the input isn't already compressed) it runs each block through LZ77 to get all the obvious double strings out of the way. Then it sends it through a Huffman Encoding to make it even shorter. So yes, that's deflate.

What About The Others (BZIP2 Anyone?)

Not all of them use this simple approach, because simple apparently isn't good enough. BZIP2 goes through 9 different stages.
  1. Replacing all 4 or more repeating characters with just 4 characters. (There you see, a geek had to make this algorithm for comic books! "AAAAAAAAAA!" is so much longer than "AAA\7!") So called Comic Book Encoding... Unfortunately not, it's simply called Run Length Encoding. How boring!
  2. Next they wheel around the characters. Which basically means a fancy way to sort the same letters after each other so compression is easier. The so called Burrows-Wheeler Transform.
  3. Replace all repeating characters with 0, so we have as many 0 as possible. Could also be any other character. The Move-to-Front Transform (Yes, I hate these people too at concerts).
  4. And again, after sorting everything possible, we do a Run-length Encoding.
  5. Then a Huffman Encoding. Don't tell me now I have to explain what that is!
  6. Then again we see Huffmaaa\7n! This time with multiple tables per block to make the output even shorter.
  7. 8. and 9. Add some binary trickery.
So yes, you can use a lot of other things, but it all comes back to making the output even shorter. Surprising, hu?

So That's Compression

Yes, that's compression. Use a bit of special encoding, remove duplicates, and you are halfway there. Of course, this is just for Deflate. We saw that BZIP2 is already a lot more complicated and uses a lot of different tricks to make the output even smaller, but then again, it is also slower! If you liked this post, you should follow me on Twitter and sign up for the RSS feed! B58P3MMWCDTH

How Does An ORM Work?

One of the things all of us use regularly and (most?) of us have no idea how it actually works is an Object Relation Mapper or ORM. I think it is crazy that a piece that we rely on and couldn't work without anymore is so foreign to most of us. So I decided to change that. And end of this wordy piece I want you to know how a ORM works. Of course you won't know all the details, like how some ORMs seem to be able to do more with the database that you even knew was possible. But some basic should be cleared up by then.

On DataMapper

For this, I decided to deep dive into DataMapper. DataMapper is an alternative to ActiveRecord, but with less magic, which is great if you are trying to read something. A basic DM model looks something like this:
class Duck
  include DataMapper::Resource

  property :id, Serial
  property :name, String
  property :hungry, Boolean
end
DataMapper::Resource is the base include that enables all the magic DataMapper functionality. Like in ActiveRecord, where you inherit from ActiveRecord::Base. The property method is defined in DataMapper::Model::Property. This method does a lot of stuff! In fact, it spans from line 27 to line 99! First up, it determines the class based on the type we provided in the class definition:
klass = DataMapper::Property.determine_class(type)
So if we define property :name, String, it will determine that we want a String! Mind bending, isn't it? Afterwards, it will instantiate the class (in our case DataMapper::Property::String). Then we get the now defined properties in the given repository (basically a "different" DB connection) and add our property to it. Easy. Now we have some filler code until the end, with the exception of these two babies:
create_reader_for(property)
create_writer_for(property)
This (shockingly!) creates the readers and writers for the property we defined. So what we know by now is that DataMapper keeps a list of all properties defined for a given class, so it can query, search, create etc. based on these values.

Querying DataMapper

So we can create a model, but that is really now all that a ORM does. With a ORM, we can of course query for data, and that is what makes it really interesting! So let's get to it! Querying is done in DataMapper::Model. Lets take an example query:
Duck.all
We find all on lines 341 - 348.
def all(query = Undefined)
  if query.equal?(Undefined) || (query.kind_of?(Hash) && query.empty?)
    # TODO: after adding Enumerable methods to Model, try to return self here
    new_collection(self.query.dup)
  else
    new_collection(scoped_query(query))
  end
end
So we are generating a new collection with the given query. Cool. Model#new_collection makes a new collection (line 749 for the lazy ones). D'uh! So what happens when we generate a new collection? This:
    def initialize(query, resources = nil)
      raise "#{self.class}#new with a block is deprecated" if block_given?

      @query        = query
      @identity_map = IdentityMap.new
      @removed      = Set.new

      super()

      # TODO: change LazyArray to not use a load proc at all
      remove_instance_variable(:@load_with_proc)

      set(resources) if resources
    end
(On Github) You might now realize: Nothing happens! No database connection is made, nothing! In fact, DataMapper doesn't actually perform the database call until really, really necessary! It just returns a "placeholder" collection if you want that has the information needed to actually perform the call. One way to trigger that call would be #first:
    def first(*args)
      first_arg = args.first
      last_arg  = args.last

      limit_specified = first_arg.kind_of?(Integer)
      with_query      = (last_arg.kind_of?(Hash) && !last_arg.empty?) || last_arg.kind_of?(Query)

      limit = limit_specified ? first_arg : 1
      query = with_query      ? last_arg  : {}

      query = self.query.slice(0, limit).update(query)

      if limit_specified
        all(query)
      else
        query.repository.read(query).first
      end
    end
Quite a bit of code here, luckily we can ignore most of it. For example the first few lines are just applicable if we actually pass in some options (which we won't do for now). The first really interesting line is line 16. Here we actually call query.repository.read(query).first. Here we call repository#read. So lets see this in action.
      def read(query)
        fields = query.fields
        types  = fields.map { |property| property.dump_class }

        statement, bind_values = select_statement(query)

        records = []

        with_connection do |connection|
          command = connection.create_command(statement)
          command.set_types(types)

          # Handle different splat semantics for nil on 1.8 and 1.9
          reader = if bind_values
            command.execute_reader(*bind_values)
          else
            command.execute_reader
          end

          begin
            while reader.next!
              records << Hash[ fields.zip(reader.values) ]
            end
          ensure
            reader.close
          end
        end

        records
      end
This is a lot simpler than it looks. It takes the query, constructs a select query and then executes it. Simple as that! In the end, it maps the values it got back from the Database into "real" objects and returns the given records to us.

So That's It

Lets review all that. So an ORM needs a structure to tell it how the database tables should be mapped to the object. DataMapper does this with the DataMapper::Model#property method, where ActiveRecord reads this data from a schema file. In the end, both automatically generate setters and getters for the properties. Then we have a layer that stores the queries, the results, etc. and constructs the right, database agnostic data so the query can be fulfilled. And the third layer, where I really glossed over very shortly, is the database adapter (Do you want to know some more about the database adapter themselves? Then contact me on Twitter or in the comments and tell me about it!). The database adapter knows how the query should be run in the database and knows how to transform the results back into the objects. And that's the way we query a database! If you liked this post, you should follow me on Twitter and signup for the RSS feed!

Why Having A Vision Is More Important Than Anything Else

Do you know why we advance slower than we have some decades ago? Do you know that we went from the planning stage to the moon in 11 years? That is if you consider the time from the creation of NASA to the time we actually landed on the moon. But I think the most important part was the speech that President Kennedy gave*. He said
First, I believe that this nation should commit itself to achieving the goal, before this decade is out, of landing a man on the Moon and returning him safely to the Earth
Which is, as someone from that time would have put it: "Thy be insane?". Or something like that. Now we know better, of course, because the US had this vision. So how will having a vision help you? * Actually, the most important thing was probably the Germans. Because both the Soviet Union and the United States took, after World War II, rocket engineers from Germany to kick-start their endeavors. Yes, the famous rocket we all saw flying around were based on Nazi technology. So yeah…

So, A Vision, Any Vision?

Of course, not any vision will do. For example, saying "I will be one of the people in my circle that some people will know" isn't a good vision. So isn't "I will be the biggest site on the internet in 3 months". The first one isn't really something measurable, and, honestly, lame. And the second one isn't really possible to achieve. What makes a good vision? A good vision is measurable. For example, "I will have 2000 subscribers by the end of the year" is simple to understand, measurable, and we always know where we stand. But it isn't a vision yet, it is just a goal. A vision should be something that can never really be reached, but motivates us to continue working. A vision is something that we can strive for, something that is measurable, and something that is barely reachable. So, a good vision could be:
We will become the number one payment processor in the world with customers that just love us!
Ok, the love part is a bit hard to measure, but it is a good guideline. And I'm sure we could find some grad students who would love to study, well, love. Should we give this customer a refund? Without a vision, we don't know, and to some we may give, to others we won't, which will confuse the customers even more. But with that vision? The action is clear!

A Good Vision Is A Good Guide

A vision isn't just for suits, or for people who want to sell something to suits (read: a company). No matter what your project is, I will bet that having a vision for it will help you immensely. For one, we won't have to ask if we should show an error message, or if it won't annoy users more if we actually tell them that they are stupid. If the vision is to build the most robust, secure system, then yes, we should show a general error message. If the software should be the most usable piece of software ever written, then we have to redesign it so that this error can't possibly happen. If we write a command line utility for Linux, we don't even have to care about that. :) A good vision answers probably 90% or more of all the small questions we have while developing a new project.

Sit Around, It's Story Time!

For example just recently I started on a new project. The first thing I did was asking for a vision. And they just repeated: "Oh yeah, build this basic thing here, we can try to analyze the rest later, it will be fine." So I started implementing it. And lo and behold, it wasn't what they wanted. It showed error messages that were clear to us, but not to the expected user. It required things which seemed obvious to me, but which weren't actually needed. At all. Even though it technically did everything they asked, it just didn't match with their vision. And yes, by now I do have a vision for the project. Most people actually have a vision in mind, but they somehow don't want to spell it out. It's crazy!

Vision == Motivation

A vision keeps us motivated. That's all. Lets go to the next paragraph. Oh, you want some more details? Ok. Well, it's simple. We can code, and code, and code, but we will never "get there", we will never "get closer to the finished product". We will always be stuck in a rut. Why? We reach the next goal, so our boss just figures out a new one for us. It sucks! We don't see the end of the tunnel. But a vision changes that! With a vision, we suddenly have something to strive for. The goals suddenly aren't arbitrary points that the boss sets for us, but are actually real points that bring us closer to the ultimate goal. Which is really what we want. Because then, at the end of the day, we can say: "Yeah, I got closer to the vision, and people are even more happy!"

Steve Jobs, A Visionary

One of the things that I think is just amazing about Steve Jobs is that he always had a vision. Maybe his private vision was to be a jerk and still be considered a successful, beloved business man, and if it was, he would have succeeded. But even more amazing are his visions for the company. For example, before the iPhone, no one thought this will be possible. And Steve Jobs just said: "It is, do it!". Or as Steve (being a lot more eloquent than I am) would have put it:
Do we have what it takes to establish a third category of products? The bar is pretty high. It has to be far better at doing some key things. We think we have the goods. Our most advanced technology in a magical and revolutionary device at an unbelievable price.
Now that's a vision! If you liked this post, you should follow me on Twitter and signup for the RSS feed!

What you should know about SOLID

Did you always wonder what that whole SOLID thing is all about? Why everyone is shouting at you for violating the Law of Demeter (you won't go to jail for that one, by the way)? And did you talk about soccer when people mentioned the Liskov substitution principle and looked at you confused? Then read on! I always like going to the SOLID wikipedia page because SOLID is a mnemonic acronym. SOLID stands for Single responsibility, Open-closed, Liskov substitution, Interface segregation and Dependency inversion. Makes sense yet? Lets briefly go over each one.

Single Responsibility Principle

The Single Responsibility Principle says that every class should have one job, and one job alone, but do it really, really well. If you want to know if your class is violating SRP, describe what the class does. If you use and or or in the sentence, it is a strong sign that you violate the SRP principle. Let's look at an example of a duck:
class Duck
  attr_accessible :name

  def eat
    @food -= 1
    remove_food
  end

  def swim
  end

  def remove_food
  end
end
A very simple class that does one thing: Keep track of a duck (in a pond for example). The problem with this example is that if we want to describe it, we would probably say the following: It keeps track of a duck and keeps track of the food. Yes, the dreaded and. So how would we rewrite this example? One way to encapsulate this better is to remove the food in a Food class and not manage the food supplies in the Duck itself. SRP also says that the model should hide the implementation details. That's why I use in the example eat instead of grab_food. How we eat is an implementation detail of the Duck model itself. If we want to change how food is handled, the only thing we should have to change is the Duck class itself, and nothing else.

Open/Closed Principle

The next principle is about open/closed classes. But you might now way "But Codepoet, in Ruby all classes are open!". And you would of course be right. But that's not the whole point of the open/closed principle. If we want to follow this principle, all classes cannot be changed once they implementation is completed, except for bug fixing. If the behavior should be changed for one part of the system, the class should be open, however, to be extended in another class. This is one of those principles which is sometimes hard to follow, since requirements change, and programs evolve. So sometimes a class has to be changed, or added some functionality. Always making a new class just isn't feasible. However, keeping this principle in mind is a good thing, especially with Ruby where we have easy ways to extend classes and even objects. Maybe some methods shouldn't be added to the base class, since most of the system doesn't need them, but just to one part of the system.

Liskov Substitution Principle

LSP (not to be confused with LSD) states that any object inheriting from another has to keep the constraints of the superclass intact. This includes preconditions (this value can't be null), postconditions (return value is rounded to an integer) and invariants. This sounds too abstract? I agree, so let's make an example: Given we develop a system to measure ducks. So of course we have a RealDuck class that can swim, eat, quack, etc. Swim means we give it a value, it proceeds the given distance, and then returns. If we then ask it for the position, it will have advanced the given distance. Easy as pie! Now we want to compare the real ducks with a mechanical replacement duck (the reason to do this is left to the reader). So we subclass it, add a new method replaceBatteries, and we are done! But not quite! Even though this looks perfect on paper, we just violated Liskovs Substitution Principle. Why? Because now, if the MechanicalDuck swims, it might not get a single feet ahead because the battery is empty. But that's an invariant that the RealDuck guarantees! (Except if you insure the duck before, but come on!) Most of the violations of LSP are small, but can have a huge rippling effect if the rest of the system relies on a different behavior.

Interface Segregation Principle

Another principle with an impressive name. But this one is simple to understand: Design your interfaces so that they deal with one thing, and only one. Sounds familiar? It should, because it is similar to the Single Responsibility Principle discussed above. The difference between the two is that SRP deals mostly on the class level, while ISP is concerned mostly about the overall architecture of a system. If you follow this principle, then your interfaces will be small and to the point, without anything in there that can't be separated out. Lets look at the RealDuck and the MechanicalDuck again. The duck has to eat, of course, and that's why it is in the class. But a mechanical device doesn't have to eat! So it doesn't make any sense that the mechanical duck has an eat method. So the interface is too large for this use case, and it should be refactored out.

Dependency Inversion Principle

The Dependency Inversion Principle states that high-level and low-level modules shouldn't be tightly coupled together, and that they can be replaced. This is especially useful for testing the higher-level components, because the lover-level modules that they depend on can simply be replaced by mock modules. In Ruby, one way to achieve this goal is to add an optional argument to initialize where the dependency can be replaced. To give an example:
class Duck
  def initialize(food_source = DefaultFoodSource.new)
    @food_source = food_source
  end
end
This way we can add another food source if we want (for example, an InfiniteFoodSource), so that we can test the duck in isolation.

And That's It!

Yes, that's it. We are done. No more talking about sports when Liskovs Substitution Principle comes up. No more hand stands when someone talks about Dependency Inversion. Nice! Though always keep in mind: These are just guidelines. Principles. They aren't law. If you have a good reason to go against one of these principles, then do it! If you liked this post, you should follow me on Twitter and signup for the RSS feed!