Map Iteration and Performance: One Size Does Not Fit All

Riddle me this, Batman!

Question: Given the hypothetical types Foo and Bar, and a Java variable declared as Map<Foo, Bar> map, and a method that performs some operation involving the map’s keys, values, or both: what is the optimal method for iterating over the map?  (And newbies be like, “there’s more than one way to iterate over a map?”)

Answer: It depends.  (And yes, Virginia, there are many ways to iterate over a Map.)

If you’re still reading, my guess is that you may have never considered this question.  And that’s not surprising; the typical introductory Java textbook will more than likely show you exactly one way to iterate over a Map, and hey, if it ain’t broke, don’t fix it, right?  Hold that thought.  By the time you finish this article, you will have seen at least three ways to iterate over a Map, and when faced with a Map iteration problem of your own, you should be able to analyze the problem and discern which method is the most efficient for the problem at hand.

O Map, O Map, How Do I Iterate Over Thee?

Let us count the ways!  Here are the basic algorithms for map iteration.

  1. Iterate over the keys.
    for(Foo foo : map.keySet()) { . . . }
  2. Iterate over the values.
    for(Bar bar : map.values()) { . . . }
  3. Iterate over the entries.
    for(Map.Entry<Foo, Bar> entry : map.entrySet()) { . . . }

What’s the Big O?

If you have studied computer science, you have probably heard of Big-O notation.  Simply put, Big O notation is a way to measure and compare the efficiency of algorithms and is often used by programmers to determine the best algorithm to use for solving a particular problem involving large datasets.

Big-O values are generally expressed in terms of the number of operations, on average, that an algorithm takes to run, given the size n of the dataset that they operate over.  For example, looping through a one-dimensional array of size n would require on the order of n operations, so we would say that its Big-O value is O(n).  Likewise, looping through a two-dimensional array having dimensions (m,n) would require on the order of m*iterations, and since for very large datasets, Big-O treats all constants more or less as equals, this nested loop gets a score of O(n2).

Let’s examine each of the three iteration methods listed above in terms of their Big-O time efficiencies in performing various tasks.  As a reference, we’ll use the Big-O CheatSheet by Eric Rowell in order to help us calculate the Big-O values.

Task 1: Print all the keys

This one seems like a no-brainer.  You are accessing only the keys, so let’s choose algorithm 1 from above.

public void printKeys(Map<Foo, Bar> map) {
  for(Foo foo : map.keySet()) {
    System.out.println(foo);
  }
}

Since all we are doing is iterating over the map’s keys, the Big-O for this method is O(n).

Task 2: Print all the values

This one also is a no-brainer.  You are accessing only the values, so let’s choose algorithm 2.

public void printValues(Map<Foo, Bar> map) {
  for(Bar bar : map.values()) {
    System.out.println(bar);
  }
}

Again this method completes in O(n) time.

Task 3: Print all the key-value pairs

Here’s where things get a little sticky.  If use naive algorithm 1, we get the following:

Naive method:

public void printValuesNaively(Map<Foo, Bar> map) {
  for(Foo foo : map.keySet()) {
    System.out.println(foo + "," + map.get(foo));
  }
}

The downside to this approach is that in addition to iterating through the map at O(n), we are calling the get method within the body of the loop, resulting an additional n get operations.

Now, for a HashMap, this is really not all that bad in terms of Big-O, since the get operation has O(1).  We end up with O(n) for the iteration, plus n* O(1) for the get operations, for a total of O(n) + O(n) = 2O(n) = O(2n) = O(n).  Still, it is not as efficient as it could be, as we’ll see.

However if we have a TreeMap, whose get operation has O(log n), now we end up with O(n) for the iteration, plus n* O(log n) for the get operations, for a total of:
O(n) + n*O(log n) = O(n) + O(n log n) = O(n log n).

Better method:

By using algorithm 3, we can avoid the additional n get operations that were performed in the body of the naive method.

public void printValuesMoreEfficiently(Map<Foo, Bar> map) {
  for(Map.Entry<Foo, Bar> entry : map.entrySet()) {
    System.out.println(entry.getKey() + "," + entry.getValue());
  }
}

As a result, we are back to a simple iteration, for O(n), regardless of map type.

O Map, Where Art Thou?

Now you have seen three basic algorithms for map iteration, and you now know that the choice of iteration algorithm depends primarily on whether you need to access just the keys, just the values, or both.

References

Big-O CheatSheet
The Idiot’s Guide to Big(O) Notation
Java Collections – Performance (Time Complexity) from Information Technology Gems

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s