keithbrown42

Tidbits from the Pluralsight trenches

IEnumerable considered ambiguous

with 6 comments

It’s tempting to use IEnumerable<T> liberally in public APIs as a “least common denominator” interface that can represent any collection of objects.

Consider two very different things that could be lurking behind an IEnumerable<T>:

A: An in-memory collection such as an array or list.

B: A lazily evaluated fire hose that hides complex network logic that makes batch requests and yields objects from those batches.

For (A) it’s safe to enumerate the collection multiple times, use the Count() extension, transform with ToList/ToArray, etc. while it could be disastrous to do the same with (B).

How are you supposed to know what you’re getting when someone hands you an IEnumerable<T>? In the vast majority of cases, you’ve probably got (A). Indeed, most programmers will assume this and program against it as an in-memory collection, and they’ll run into trouble (perf problems or worse) when they encounter (B).

Consider a method that takes an IEnumerable<T> as an input. Is it safe to pass a (B) to that method? It’s impossible to know without looking at the implementation.

You could take an IList<T> or an ICollection<T> instead, but these interfaces have mutators on them, which introduces additional ambiguity – does your method modify the collection being passed in?

I’m curious to know what conventions others are using to deal with this ambiguity.

About these ads

Written by Keith Sparkjoy

August 3, 2011 at 6:36 am

Posted in Modeling

6 Responses

Subscribe to comments with RSS.

  1. This is a pretty common problem for library authors. Another interesting variation is whether you’re allowed to stash a reference to IEnumerable when passed one via a constructor.

    My general approach is:

    - Never stash, always keep a copy’s made via ToList/ToArray.
    - If you run it once and only once, then it’s fine; if you need things like Count or multiple runs, then copy it first.

    An advanced wrinkle here is that most of LINQ’s lazy evaluators aren’t remote-friendly, so passing an IEnumerable across app domain boundaries almost always means doing a ToList/ToArray before doing so.

    Brad Wilson

    August 3, 2011 at 7:16 am

    • Makes sense, Brad. Prolly safest to run that ToList/ToArray right there in the ctor so you don’t end up with surprising results by evaluating later.

      keithps

      August 3, 2011 at 8:19 am

  2. My personal view on this is to not give something of type (B) an IEnumerable interface.

    In reality we are actually dealing with two interfaces that happen to have the same methods and name which allows them to be operated on by the same code.

    When I think about defining an interface, I think about how my class confirms to the usage of that interface. For IEnumerable, I should be obeying that contract both implied and explicit.

    The implied contract of IEnumerable is that it behaves like an in memory implementation. Most programmers using IEnumerable will want to use it with Count(), ToList(), ToArray(), etc. Therefore, if I implement IEnumerable I should be sure that I am properly fulfilling that implied contract.

    A more extreme example would be a class C that implemented IEnumerable, but when it is iterated over it destroys data permanently.

    I could make it implement IEnumerable, but I really shouldn’t. For that same reason I would suggest not having a type (B) implement IEnumerable, or at least having some kind of implementation of IEnumerable that obeys both the implicit and explicit contracts of IEnumerable.

    Perhaps even creating a new interface that inherits from IEnumerable called, IAmASlowEnumerable or something to that effect, so that a user of the interface would know what they are getting into.

    jsonmez

    August 3, 2011 at 7:25 am

    • John, that’s pretty much where I’ve landed as well. I’m thinking my IFireHose or IAmASlowEnumerable might look something like this:

      public interface IFireHose {
      IEnumerable GetNextBatchOfObjects();
      }

      keithps

      August 3, 2011 at 8:17 am

      • I like that approach, this way your data can still be enumerated, but at the same time the user of that data is aware of how to properly interact with it.

        jsonmez

        August 3, 2011 at 12:49 pm

  3. This, incidentally, is an excellent example of a fundamental problem at the heart of the concept of duck typing. In fact the thing people call “duck typing” is misnamed – it’s really Humpty Dumpty typing. In Lewis Carroll’s “Through the Looking Glass”, we find:

    “When *I* use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”

    That is the true spirit of so-called duck typing: an unsupported assumption that words mean whatever you think they mean at any particular moment. When something “Quacks” nothing actually verifies the “like a duck” part implied by duck typing. Indeed verifying that something is indeed quacking “like a duck” and not like, say, Dr. Nick Riviera (hi everybody!) is what static typing is for, and is pretty much the opposite of what most people seem to mean when they say duck typing.

    In cases where you really need to know whether something enumerates like a firehose, or like an array, duck is a hindrance, not a help. The solution appears to be more expressive static typing, not less.

    Ian Griffiths (@idg10)

    January 22, 2012 at 4:58 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 213 other followers

%d bloggers like this: