Taking Up Collections

« Previous: Relax! Don’t Do It! Next: Iteration Nation »

Taking Up Collections

Introducing arrays and hashes in Ruby

Posted Oct. 3, 2015

In the Ruby programming language, two main data types carry the bulk of the load when it comes to handling collections: arrays and hashes. Understanding how arrays and hashes are alike, and how they are different, is critical to programming successfully in Ruby.

Demanding arrays

An array is a list of values arranged in a sequential order. The easiest way to create an array is to declare a variable, and set it equal to the contents of the list we intend to save. It is common convention to give arrays variable names that are plural, to suggest that the variable represents a collection of items. Here, we make an array of our favorite pies:

pies = [ "pumpkin", "apple", "pecan" ]

There are other ways to create an array, such as by calling Array.new or by using the Array() method, but those techniques are beyond the scope of this post.

Since arrays store values in a specific sequence, each value is identified by an integer index number which records it's place in the sequence. In typical computer-science fashion, array indexes begin counting at 0, so the first value in the array has the index 0, the second value has the index 1, and so on. To access a value in an array, we use the name of the array, followed by the value's index in square brackets. Here, we want the second value in our pies array, so we use the index [1].

$ pies[1]
=> "apple"

Calling puts with an array as its argument will print each member of the array on a separate line, like this:

$ puts pies
pumpkin
apple
pecan
=> nil

(Recall that puts always returns the value nil.)

We can add another value to our array using the push method. The value will be added to the end of the array, and is included as the argument to the push method. Parentheses around the argument are optional. Continuing with our pies array:

$ pies.push( "cherry" )
=> ["pumpkin", "apple", "pecan", "cherry"]
$ pies.push "mud"
=> ["pumpkin", "apple", "pecan", "cherry", "mud"]

You can also append values to an array with the "shovel" operator, <<. The shovel operator can be chained to add several values on a single line, as well. We'll start a new array of integers for this example.

$ numbers = [1, 2, 3]
=> [1, 2, 3]
$ numbers << 4
=> [1, 2, 3, 4]
$ numbers << 5 << 6 << 7
=> [1, 2, 3, 4, 5, 6, 7]

To remove the last value from an array, you use the opposite of push, which is pop. This method permanently removes the last value, and returns its value.

$ new_numbers = [10, 20, 30]
=> [10, 20, 30]
$ numbers.pop
=> 30
$ numbers
=> [10, 20] # array contents after pop

If you want to remove the first value in the array instead, you can use shift, so named because it removes the first value from the array, and shifts all the remaining values down one index. The opposite of shift is unshift, which takes a value as an argument (again, parentheses optional), and inserts that value into the first position of the array, while "unshifting" the remaining elements up one index. shift returns the value removed from the array, while unshift returns the entire array, including the newly-added value in the first position.

$ other_numbers = [25, 50, 75]
=> [25, 50, 75]
$ numbers.shift
=> 25
$ numbers
=> [50, 75] # array contents after shift
$ numbers.unshift(45)
=> [45, 50, 75] # array contents after unshift

There are literally dozens of other methods that can be used to manipulate arrays, adding values to them, taking values away, sorting them, re-ordering them, finding specific values or ranges within them, etc. You can find a complete listing of array methods in the official Ruby documentation, and a tutorial covering Ruby arrays at sitepoint.com.

You may have noticed that in all my example arrays so far, every item in each array has been of the same data type. Our pies array consisted entirely of strings, while numbers was all integers. In fact, there is no restriction in Ruby that arrays be of the same data type. You can mix and match as many data types in a single array as you please. The following is a perfectly legal array in Ruby:

$ peyton_manning = [18, "QB", 65.4, ["Colts", "Broncos"]]
=> [18, "QB", 65.4, ["Colts", "Broncos"]]

This array contains an integer, a string, a float, and even another array, and it's all perfectly legal. But just because you can do something, doesn't mean you necessarily should do it. The values in this array all describe NFL quarterback Peyton Manning, but in order for them to be useful, we have to memorize which index in the array points to which piece of information. For example, to find out what Manning's pass-completion percentage is, we have to know to ask for peyton_manning[2]. You'd think there'd be a better way to store collections of related data of differing types, and there is. Ruby calls it a hash.

Making a hash of things

Like arrays, a Ruby hash is a data type storing a collection of values, and like arrays, hashes can contain data of any and all types, in any combination. The difference between an array and a hash is in how you access that data.

Remember that an array is an ordered set of values, and you access the values you want using a numeric index. In recent versions of Ruby, a hash also happens to be an ordered set of values (in early Ruby versions, hashes weren't ordered), but you don't access their values using indexes. Instead, data in a hash is stored as key-value pairs. That is, for every value in the hash, there is a key that lets you access it. For example, if I created a hash to keep track of my pets, I might store within it the key :dogs, accessing the value "Moose". Hashes are commonly created in a manner very similar to how we set up arrays, using a structure called a "hash literal". A hash literal creating my full pets hash might look something like this:

$ jeffs_pets = {
$ :dogs => "Moose",
$ :turtles => "Max",
$ :fishes => ["Larry", "Curly", "Moesha"]
$ }
=> {:dogs=>"Moose", :turtles=>"Max", :fishes=>["Larry", "Curly", "Moesha"]}

For readability, I typed each key-value pair in the hash on its own line, but I could have legally crammed them all onto a single line. Note that while we used square brackets to contain our array, hashes are held in curly brackets. For each key-value pair in the hash, the key is listed first, followed by the "hash rocket" operator consisting of an equals sign and a greater-than sign, and then the value. Values can be of any type; in this hash, the first two values are strings, but the third value is an array, which is how we assign more than one value to a single key.

Technically, keys can be any value type as well, but hashes are usually constructed using symbols as keys. (Symbols are a special type of object, similar to strings, but with some special properties we don't need to delve into here.) Symbols can be recognized because they always begin with a colon, as in :turtle.

If you are using symbols as the keys to your hash, Ruby offers an even more friendly alternative syntax for the hash literal, which eliminates the hash rocket operator. Using this method, each key-value pair is entered key first, followed by a colon, then the value. Note that this alternative syntax only works when the hash keys are symbols. The following version produces the exact same hash of my pets as the one with the hash rockets, above:

$ jeffs_pets = {
$ :dogs => "Moose",
$ :turtles => "Max",
$ :fishes => ["Larry", "Curly", "Moesha"]
$ }
=> {:dogs=>"Moose", :turtles=>"Max", :fishes=>["Larry", "Curly", "Moesha"]}

The syntax to access the values within a hash is similar to that used to get to the values within an array, but in place of the array's numeric indexes, we use the hash's keys inside the square brackets. Lets check out the names of my pets:

$ jeffs_pets[:dogs]
=> "Moose"
$ jeffs_pets[:fishes]
=>["Larry", "Curly", "Moesha"]
$ jeffs_pets[:cats]
=> nil

So, entering jeffs_pets[:dogs] returned my dog's name, Moose, and jeffs_pets[:fishes] returned the full array containing the names of all my fish. When we asked for my cat's name, with jeffs_pets[:cat], Ruby shrugged and said "nil," which is perfect, since nil, zilch, nada, bupkis, is exactly how many cats I own. In fact, when you ask for the value of a key that is not included in the hash, Ruby will return nil every time. Incidentally, if we want to get to the name of my second fish, we'd just add another square-bracketed index behind the hash name—first the hash key, then the numeric index within the array, like this:

$ jeffs_pets[:fish][1]
=> "Curly"

Although Ruby does remember the order of the key-value pairs in a hash, programmers for the most part don't care. Because of this, we don't need a bunch of different methods for accessing, inserting, or removing items in the hash according to their position in the sequence. Adding a new key value pair is a lot like declaring and setting the value of a variable—you name it, then set it equal to the value.

$ jeffs_pets[:chickens] = "Colonel"
=> "Colonel"
$ jeffs_pets
=> {:dogs=>"Moose", :turtles=>"Max", :fishes=>["Larry", "Curly", "Moesha"], :chickens=>"Colonel"}

To remove a key-value pair from a hash, you use the delete method, with the name of the key as the argument. This will return the value of the deleted key. Poor Max...

$ jeffs_pets.delete(:turtles)
=> "Max"
$ jeffs_pets
=> {:dog=>"Moose", :fish=>["Larry", "Curly", "Moesha"], :chicken=>"Colonel"}

As is the case for arrays, Ruby offers dozens of methods for manipulating hashes. You can read about hash methods in the Ruby docs, or get a more newb-friendly tutorial on hashes at sitepoint.com.

I'll continue my discussion of Ruby arrays and hashes in my next post, Iteration Nation.

« Previous: Relax! Don’t Do It! Next: Iteration Nation »