Swift Iteration Showdown
17 Aug 2018I’ve had conversations where people don’t see the benefit of using functions like map
over hand rolling a for loop with an accumulator, with the main argument being that everyone understands a for loop so “why add complexity?”. Whilst I agree that most developers could rattle off a for loop in their sleep it doesn’t mean we should use the lower level constructs that have more risk. I compare it to a light switch in my house, I fully understand how to connect two wires to turn a light on but I’m not about to enter a dark room and start twiddling with wires when I can just use a switch.
The other trend I have picked up on during code reviews and discussions is that people are happy to use higher order functions like map
, forEach
and reduce
but because they’ve not come from a functional programming background they can often misuse these functions. To make the most of these functions and to avoid surprises for future you and your team mates there are some rules that should be obeyed (some functional languages enforce these rules). The misuse I see is generally around putting side effects where they don’t belong.
I think the best way to get people on board is to start off with examples of basic iteration and progressively change the iteration style whilst analysing the potential areas for improvement with each code listing. Hopefully by the end we’ll have a shared understanding of what is good/bad and be able to identify the tradeoffs in each iteration style.
Basic iteration
The most basic iteration in most languages is some variation of for (int i = 0; i < end; i++) { ... }
. This is not available in Swift but we can approximate it with the following
There are a lot of things about this loop that could be considered bad:
-
There is mutable state with the
index
variable.
Mutable state is more difficult to reason about than immutable state and things become easier if we can remove it. I’ve found that in longer code listings mutable state can be especially tricky as you have to read more just to keep track of the mutation. Diff tools can add to the difficulty as they may decide to collapse a long listing, meaning that you need to unfold all the code before you can sanity check that mutation is happening where you expect. - We are in control of the iteration.
There are plenty of opportunities to mess this up - here’s some:- Forget to increment the index
- Increment the index too much
- Write the condition with
<=
and get an out of bounds - Mess the condition up completely by adding more complexity - e.g.
index < collection.count || someThingThatEvaluatesToTrue
-
This only works for zero based indices.
ArraySlice
is not guaranteed to be zero based as it’s a view on the original collection. -
.count
could change between loops. -
It’s less efficient than newer forms of iteration as items are fetched one at a time.
- It’s not very well scoped.
The code block that is being evaluated during the loop causes side effects in the surrounding scope, in this case the side effect is the mutation ofindex
.
Whilst this iteration works you can see from the list above that there is plenty of room for improvement for making this safer and reducing complexity.
Safer indexing
We can make an improvement on the above by getting the indices from the collection itself
The improvements here are quite nice:
-
We’ve removed the mutable
index
variable.
Remember mutability is evil. -
We’ve removed the side effect of mutating
index
.
We still have the side effect caused by callingprint
but that’s the core of the algorithm so that’s fine. -
We are no longer in control of the iteration.
We are still in control of accessing elements at specific indices but this is an improvement. -
We are no longer calculating indices as they are given to us by the collection, which means this will work with non zero indexed collections like
ArraySlice
instances.
Remove indexing
Most of my complaints so far have been related to indexing - both calculating the correct index and then performing the access. We can improve this situation by using for/in
syntax.
The improvements here are:
-
We’ve removed indexing/manual control of the iteration.
-
Iteration can be more efficient - for example in Objective-C this would use
NSFastEnumeration
under the hood that would request batches of objects instead of fetching items one by one, which could be more efficient.
The previous disadvantages are now mostly gone. To continue evaluating the tradeoffs we need to step our thinking up a level to see the improvements that can be made. Some bad things about the above are:
- There is no easy way to intuit the intent of the loop without studying the whole body.
For example am I iterating to:- build up a new collection?
- reduce a collection down to a single value?
- purely cause side effects?
- It’s not composable.
The iteration is using a code block not a closure. This means I can’t reuse the block of code like I can with a closure.
forEach
To handle all the issues raised so far we can jump to using forEach
There are a few nice advantages with this implementation:
-
We are no longer concerned with the act of iterating.
We simply provide a block andforEach
will ensure that it is invoked for each element in the collection. -
Improved composability.
AsforEach
takes a block we can also reuse the same block in multiple places, which allows us to build our programs from smaller pieces. -
We removed boilerplate.
This is a big win as iteration can be considered as boiler plate, which we all know is tiresome and error prone to write. In addition it’s not DRY to have the same structures all over the place so it makes sense to codify patterns. -
The function signature hints at the iterations intent.
When you look at this function signature
you can see that there is no return value - this is a big hint that this function is all about side effects. If a function does not return a result then for it to add any value it needs to cause side effects. For clarification in all the examples above the side effect has been printing to stdout.
map
We get the same benefits (listed above) when we use map
. The difference here is that the function signature reveals a different intent:
This function does have a return type, which suggests this function is all about creating something and not having side effects. The function signature also shows that we need to provide a function that transforms Element
to T
, which provides the full picture - this function will create a new collection by using the provided function to map values. In terms of inferring that this function should not have side effects you can derive this from a few different principles:
-
Single Responsibility Principle.
When applied at the function level it really is about only doing one thing. In this case that one thing is transforming each element to build a new output. -
Query/Command separation.
In an ideal world a function should either be a query or a command. A query is what we have here - we invoke the function and expect a result back. A command would be where we don’t want a result but we want to do something. (There are occasions where you will have both a query/command in one function but it’s best to try and separate these things).
For comparison here’s classic iteration vs map
:
You can see that classic iteration introduces a few issues again:
-
Mutable state.
results
is declared asvar
so that it can be mutated in each run of the loop. In this listing we can see all the mutation but in a longer code listing you might have to validate that mutation isn’t happening in other places.
results
also has the position of being mutable after the loop has concluded - so even if you do the correct mutation inside the loop it doesn’t mean you are in the clear. You need to validate that no mutation is happening after the loop. -
Efficiency.
In the example abovemap
is more efficient as it will likely preallocate a new array with the correct size to take all the transformed elements. My naive implementation of classic iteration is not that clever so the array could potentially need to allocate storage multiple times. -
Side effects leaving the scope of the iteration block.
In order to mutateresults
the block inside the loop has to mutate a value outside it’s own scope. Ideally we want to keep the scope of mutation as small as possible. -
It’s wordier.
We read more than we write so it’s important that our ideas are communicated succinctly. -
I can’t easily infer intent.
Withmap
I know the function is returning a new collection by applying some transform. With the classic loop I am relying on reading the whole listing and recognising the pattern of collecting transformed items. Recognising common patterns requires practice and experience so may be harder for newer developers. -
It’s nearly all boilerplate.
In the classic iteration example nearly all of the code is boilerplate, which is obfuscating my really interesting algorithm that stringifies each item. -
It’s not an expression.
I’ve deliberately made themap
super short and inlined it inside theprint
. That’s the beautiful thing aboutmap
compared to classic iteration -map
is an expression whereas thefor/in
loop is control flow, which changes the places where they are allowed to be used.
Wrap up
The issues I have seen in code reviews are that people are happy to use forEach
, map
and other higher order functions but they then muddy the water by not following the intent of the functions and introducing side effects in functions that should be side effect free or adding an accumulator to a function that should be purely about side effects. I’ve even seen multiple scenarios where people use a map
and also have an accumulator for collecting different bits, there is almost always a more appropriate API to prevent us from the misuse.
In order to take advantage of the higher level functions we should make sure that we follow the rules around side effects and single responsibility. The aim should be to use these functions to clarify intent, the intent becomes confusing if you use a map
and also cause side effects or if you use forEach
whilst modifying an accumulator. If we stick to the rules then it makes the task of reading code simpler as the functions behave in known ways and we don’t need to become experts at recognising patterns.
This post is in no way advocating that you go and change every iteration to use higher order functions but I do hope that you’ll be able to know the tradeoffs and decide when it is appropriate to use each tool. My real aim with this post is to allow people who have read it to have a deeper appreciation of the tradeoffs around different iteration techniques and to allow for a shared base knowledge to build ideas upon.