Btukfyl

Question

Based on my answer I have my implementation of linq Zip operator which operates on different length lists, and loops shortest list.

My implementation:

using System;

using System.Collections.Generic;

using System.Linq;



namespace SO

{

    internal class Program

    {

        public static void Main(string args)

        {

            List<String> listA = new List<string> {"a", "b", "c", "d", "e", "f", "g"};

            List<String> listB = new List<string> {"1", "2", "3"};



            var mix = listA.ZipNew(listB, (l1, l2) => new {l1, l2}).SelectMany(x => x);



            foreach (var m in mix)

            {

                Console.WriteLine(m);

            }

        }

    }



    public static class Impl

    {               

        public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>( 

            this IEnumerable<TFirst> first, 

            IEnumerable<TSecond> second, 

            Func<TFirst, TSecond, TResult> resultSelector) 

        { 

            using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 

            using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())

            {

                var i1 = true;

                var i2 = true;

                var i1Shorter = false;

                var i2Shorter = false;

                var firstRun = true;





                while(true) 

                {

                    i1 = iterator1.MoveNext();

                    i2 = iterator2.MoveNext();



                    if (!i1 && (i1Shorter || firstRun))

                    {

                        iterator1.Reset();

                        i1 = iterator1.MoveNext();

                        i1Shorter = true;

                        firstRun = false;

                    }



                    if (!i2 && (i2Shorter || firstRun))

                    {

                        iterator2.Reset();

                        i2 = iterator2.MoveNext();

                        i2Shorter = true;

                        firstRun = false;

                    }



                    if (!(i1 && i2))

                    {

                        break;

                    }



                    yield return resultSelector(iterator1.Current, iterator2.Current); 

                }

            } 

        }

    }

}

And I wonder if this implementation could be improved somehow, what could be improved, for better readability or speed.

Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended. — 16 hours ago
Never Reset an enumerator. Just get a new enumerator. Reset was a misfeature intended for COM interop scenarios. — 11 hours ago

Peter Taylor 15.3k2657 · Answer 1 · 2018-11-21 14:54:59Z

    public static class Impl

    {

        public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(

Names? The class would be more descriptive as something like LinqExtensions; the method something like ZipLooped.

            using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 

            using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())

            {

                var i1 = true;

                var i2 = true;

                var i1Shorter = false;

                var i2Shorter = false;

                var firstRun = true;

The iterators have useful names, but what does i1 mean? And why five variables to track the state of two iterators? IMO it would be simpler as

                var firstEnded = false;

                var secondEnded = false;



                while (true) 

                {

                    if (!iterator1.MoveNext())

                    {

                        if (secondEnded) yield break;

                        firstEnded = true;

                        iterator1.Reset();

                        if (!iterator1.MoveNext()) yield break;

                    }

                    if (!iterator2.MoveNext())

                    {

                        if (firstEnded) yield break;

                        secondEnded = true;

                        iterator2.Reset();

                        if (!iterator2.MoveNext()) yield break;

                    }



                    yield return resultSelector(iterator1.Current, iterator2.Current); 

                }

and the almost repeated code might be worth pulling out as an inner method:

                var firstEnded = false;

                var secondEnded = false;



                bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)

                {

                    if (it.MoveNext()) return true;

                    // `it` has done a full cycle; if the other one has too, we've finished

                    if (otherEnded) return false;

                    thisEnded = true;

                    // Start again, although if `it` is empty we need to abort

                    it.Reset();

                    return it.MoveNext();

                }



                while (true)

                {

                    if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;

                    if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;

                    yield return resultSelector(iterator1.Current, iterator2.Current); 

                }

I notice that you've decided to yield break if either of the enumerables is empty. Would an exception be a better choice?

I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-) — 14 hours ago
@t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break) if both are empty. — 13 hours ago
@VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare this new { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump(); The result is an empty collection. — 13 hours ago
@t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as per Average). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P — 13 hours ago

Pieter Witvoet 4,846723 · Answer 2 · 2018-11-21 15:12:29Z

I noticed a few things that can be improved:

Not all enumerators support Reset. Generator methods don't, for example, so calling ZipNew on the result of a ZipNew call will fail with a NotSupportedException. Obtaining a new enumerator should work, at the cost of having to replace the convenient using statements with try/finally constructions.

There's no need to call Reset when a collection is empty. I'd probably add a special case for that.

Passing null causes either an unspecific NullReferenceException or an ArgumentNullException with parameter name source to be thrown. Throwing ArgumentNullExceptions with accurate parameter names would be more helpful.

i1 and i2 can be declared inside the while loop.

score 2 · Answer 3 · 2018-11-22 05:25:06Z

If you should respect that not all Enumerators implement Reset() then it is not possible to use using statements for the two IEnumerators. But you could introduce an IEnumerator<TResult> for the zipped result and use it like this:

public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(

this IEnumerable<TFirst> first,

IEnumerable<TSecond> second,

Func<TFirst, TSecond, TResult> resultSelector)

{

  using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))

  {

    while (zipEnumerator.MoveNext())

    {

      yield return zipEnumerator.Current;

    }

  }

}

In this way you're back on the using track.

The ZipEnumerator it self could be something like:

  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>

  {

    IEnumerable<T> m_dataT;

    IEnumerable<S> m_dataS;

    IEnumerator<T> m_enumerT;

    IEnumerator<S> m_enumerS;

    List<IDisposable> m_disposables = new List<IDisposable>();

    Func<T, S, TResult> m_selector;

    bool m_secondReloaded = false;

    bool m_first = true;



    public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)

    {

      m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));

      m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));

      m_selector = selector ?? throw new ArgumentNullException(nameof(selector));



    }



    public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);



    object IEnumerator.Current => Current;



    public void Dispose()

    {

      foreach (IDisposable disposable in m_disposables)

      {

        disposable.Dispose();

      }

      m_disposables.Clear();

    }



    private IEnumerator<T> GetTEnumerator()

    {

      var enumerator = m_dataT.GetEnumerator();

      m_disposables.Add(enumerator);

      return enumerator;

    }



    private IEnumerator<S> GetSEnumerator()

    {

      var enumerator = m_dataS.GetEnumerator();

      m_disposables.Add(enumerator);

      return enumerator;

    }



    public bool MoveNext()

    {

      m_enumerT = m_enumerT ?? GetTEnumerator();

      m_enumerS = m_enumerS ?? GetSEnumerator();



      if (m_first)

      {

        if (m_enumerT.MoveNext())

        {

          if (!m_enumerS.MoveNext())

          {

            m_enumerS = GetSEnumerator();

            m_secondReloaded = true;

            if (!m_enumerS.MoveNext())

              return false;

          }

          return true;

        }

        else

        {

          m_first = false;

        }

      }



      if (!m_first && !m_secondReloaded)

      {

        if (m_enumerS.MoveNext())

        {

          if (!m_enumerT.MoveNext())

          {

            m_enumerT = GetTEnumerator();

            if (!m_enumerT.MoveNext())

              return false;

          }



          return true;

        }

      }



      return false;

    }



    public void Reset()

    {

      m_secondReloaded = false;

      m_first = true;

      m_enumerT = null;

      m_enumerS = null;

      Dispose();

    }

  }

It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator itself is disposed off?

The MoveNext() method went a little more complicated than I like, so feel free to edit or suggest improvements.

Edit

A refactored version of ZipEnumerator:

  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>

  {

    IEnumerable<T> m_dataT;

    IEnumerable<S> m_dataS;

    IEnumerator<T> m_enumeratorT;

    IEnumerator<S> m_enumeratorS;

    List<IDisposable> m_disposables = new List<IDisposable>();

    Func<T, S, TResult> m_selector;

    bool m_secondReloaded = false;

    bool m_isInitilized = false;



    public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)

    {

      m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));

      m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));

      m_selector = selector ?? throw new ArgumentNullException(nameof(selector));

    }



    public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);

    object IEnumerator.Current => Current;



    public void Dispose()

    {

      DoDispose();

    }



    private void RegisterDisposable(IDisposable disposable)

    {

      m_disposables.Add(disposable);

      if (m_disposables.Count > 10)

      {

        DoDispose();

      }

    }



    private void DoDispose()

    {

      foreach (IDisposable disposable in m_disposables)

      {

        disposable.Dispose();

      }

      m_disposables.Clear();

    }



    private IEnumerator<T> GetTEnumerator()

    {

      var enumerator = m_dataT.GetEnumerator();

      RegisterDisposable(enumerator);

      return enumerator;

    }



    private IEnumerator<S> GetSEnumerator()

    {

      var enumerator = m_dataS.GetEnumerator();

      RegisterDisposable(enumerator);

      return enumerator;

    }



    private Func<bool> CurrentMover = null;



    private bool FirstMover()

    {

      if (m_enumeratorT.MoveNext())

      {

        if (!m_enumeratorS.MoveNext())

        {

          m_enumeratorS = GetSEnumerator();

          m_secondReloaded = true;

          if (!m_enumeratorS.MoveNext())

            return false;

        }

        return true;

      }

      else if (!m_secondReloaded)

      {

        CurrentMover = SecondMover;

        return CurrentMover();

      }



      return false;

    }



    private bool SecondMover()

    {

      if (m_enumeratorS.MoveNext())

      {

        if (!m_enumeratorT.MoveNext())

        {

          m_enumeratorT = GetTEnumerator();

          if (!m_enumeratorT.MoveNext())

            return false;

        }



        return true;

      }



      return false;

    }



    private void Initialize()

    {

      m_enumeratorT = GetTEnumerator();

      m_enumeratorS = GetSEnumerator();

      CurrentMover = FirstMover;

      m_isInitilized = true;

    }



    public bool MoveNext()

    {

      if (!m_isInitilized)

      {

        Initialize();

      }

      return CurrentMover();

    }



    public void Reset()

    {

      m_isInitilized = false;

      m_secondReloaded = false;

      CurrentMover = null;

      m_enumeratorT = null;

      m_enumeratorS = null;

      DoDispose();

    }

  }

I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-) — 11 hours ago
I think the word unconventional describes them pretty good... although ts and ss were below that level ;-] — 11 hours ago
I can explain that ;-P It's local and in a very small scope, this is allowed in my world, tt and ss on the other hand were public. — 11 hours ago

Peter Taylor 15.3k2657 · Answer 4 · 2018-11-21 14:54:59Z

    public static class Impl

    {

        public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(

Names? The class would be more descriptive as something like LinqExtensions; the method something like ZipLooped.

            using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 

            using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())

            {

                var i1 = true;

                var i2 = true;

                var i1Shorter = false;

                var i2Shorter = false;

                var firstRun = true;

The iterators have useful names, but what does i1 mean? And why five variables to track the state of two iterators? IMO it would be simpler as

                var firstEnded = false;

                var secondEnded = false;



                while (true) 

                {

                    if (!iterator1.MoveNext())

                    {

                        if (secondEnded) yield break;

                        firstEnded = true;

                        iterator1.Reset();

                        if (!iterator1.MoveNext()) yield break;

                    }

                    if (!iterator2.MoveNext())

                    {

                        if (firstEnded) yield break;

                        secondEnded = true;

                        iterator2.Reset();

                        if (!iterator2.MoveNext()) yield break;

                    }



                    yield return resultSelector(iterator1.Current, iterator2.Current); 

                }

and the almost repeated code might be worth pulling out as an inner method:

                var firstEnded = false;

                var secondEnded = false;



                bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)

                {

                    if (it.MoveNext()) return true;

                    // `it` has done a full cycle; if the other one has too, we've finished

                    if (otherEnded) return false;

                    thisEnded = true;

                    // Start again, although if `it` is empty we need to abort

                    it.Reset();

                    return it.MoveNext();

                }



                while (true)

                {

                    if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;

                    if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;

                    yield return resultSelector(iterator1.Current, iterator2.Current); 

                }

I notice that you've decided to yield break if either of the enumerables is empty. Would an exception be a better choice?

I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-) — 14 hours ago
@t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break) if both are empty. — 13 hours ago
@VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare this new { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump(); The result is an empty collection. — 13 hours ago
@t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as per Average). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P — 13 hours ago

Pieter Witvoet 4,846723 · Answer 5 · 2018-11-21 15:12:29Z

I noticed a few things that can be improved:

Not all enumerators support Reset. Generator methods don't, for example, so calling ZipNew on the result of a ZipNew call will fail with a NotSupportedException. Obtaining a new enumerator should work, at the cost of having to replace the convenient using statements with try/finally constructions.

There's no need to call Reset when a collection is empty. I'd probably add a special case for that.

Passing null causes either an unspecific NullReferenceException or an ArgumentNullException with parameter name source to be thrown. Throwing ArgumentNullExceptions with accurate parameter names would be more helpful.

i1 and i2 can be declared inside the while loop.

score 2 · Answer 6 · 2018-11-22 05:25:06Z

If you should respect that not all Enumerators implement Reset() then it is not possible to use using statements for the two IEnumerators. But you could introduce an IEnumerator<TResult> for the zipped result and use it like this:

public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(

this IEnumerable<TFirst> first,

IEnumerable<TSecond> second,

Func<TFirst, TSecond, TResult> resultSelector)

{

  using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))

  {

    while (zipEnumerator.MoveNext())

    {

      yield return zipEnumerator.Current;

    }

  }

}

In this way you're back on the using track.

The ZipEnumerator it self could be something like:

  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>

  {

    IEnumerable<T> m_dataT;

    IEnumerable<S> m_dataS;

    IEnumerator<T> m_enumerT;

    IEnumerator<S> m_enumerS;

    List<IDisposable> m_disposables = new List<IDisposable>();

    Func<T, S, TResult> m_selector;

    bool m_secondReloaded = false;

    bool m_first = true;



    public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)

    {

      m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));

      m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));

      m_selector = selector ?? throw new ArgumentNullException(nameof(selector));



    }



    public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);



    object IEnumerator.Current => Current;



    public void Dispose()

    {

      foreach (IDisposable disposable in m_disposables)

      {

        disposable.Dispose();

      }

      m_disposables.Clear();

    }



    private IEnumerator<T> GetTEnumerator()

    {

      var enumerator = m_dataT.GetEnumerator();

      m_disposables.Add(enumerator);

      return enumerator;

    }



    private IEnumerator<S> GetSEnumerator()

    {

      var enumerator = m_dataS.GetEnumerator();

      m_disposables.Add(enumerator);

      return enumerator;

    }



    public bool MoveNext()

    {

      m_enumerT = m_enumerT ?? GetTEnumerator();

      m_enumerS = m_enumerS ?? GetSEnumerator();



      if (m_first)

      {

        if (m_enumerT.MoveNext())

        {

          if (!m_enumerS.MoveNext())

          {

            m_enumerS = GetSEnumerator();

            m_secondReloaded = true;

            if (!m_enumerS.MoveNext())

              return false;

          }

          return true;

        }

        else

        {

          m_first = false;

        }

      }



      if (!m_first && !m_secondReloaded)

      {

        if (m_enumerS.MoveNext())

        {

          if (!m_enumerT.MoveNext())

          {

            m_enumerT = GetTEnumerator();

            if (!m_enumerT.MoveNext())

              return false;

          }



          return true;

        }

      }



      return false;

    }



    public void Reset()

    {

      m_secondReloaded = false;

      m_first = true;

      m_enumerT = null;

      m_enumerS = null;

      Dispose();

    }

  }

It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator itself is disposed off?

The MoveNext() method went a little more complicated than I like, so feel free to edit or suggest improvements.

Edit

A refactored version of ZipEnumerator:

  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>

  {

    IEnumerable<T> m_dataT;

    IEnumerable<S> m_dataS;

    IEnumerator<T> m_enumeratorT;

    IEnumerator<S> m_enumeratorS;

    List<IDisposable> m_disposables = new List<IDisposable>();

    Func<T, S, TResult> m_selector;

    bool m_secondReloaded = false;

    bool m_isInitilized = false;



    public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)

    {

      m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));

      m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));

      m_selector = selector ?? throw new ArgumentNullException(nameof(selector));

    }



    public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);

    object IEnumerator.Current => Current;



    public void Dispose()

    {

      DoDispose();

    }



    private void RegisterDisposable(IDisposable disposable)

    {

      m_disposables.Add(disposable);

      if (m_disposables.Count > 10)

      {

        DoDispose();

      }

    }



    private void DoDispose()

    {

      foreach (IDisposable disposable in m_disposables)

      {

        disposable.Dispose();

      }

      m_disposables.Clear();

    }



    private IEnumerator<T> GetTEnumerator()

    {

      var enumerator = m_dataT.GetEnumerator();

      RegisterDisposable(enumerator);

      return enumerator;

    }



    private IEnumerator<S> GetSEnumerator()

    {

      var enumerator = m_dataS.GetEnumerator();

      RegisterDisposable(enumerator);

      return enumerator;

    }



    private Func<bool> CurrentMover = null;



    private bool FirstMover()

    {

      if (m_enumeratorT.MoveNext())

      {

        if (!m_enumeratorS.MoveNext())

        {

          m_enumeratorS = GetSEnumerator();

          m_secondReloaded = true;

          if (!m_enumeratorS.MoveNext())

            return false;

        }

        return true;

      }

      else if (!m_secondReloaded)

      {

        CurrentMover = SecondMover;

        return CurrentMover();

      }



      return false;

    }



    private bool SecondMover()

    {

      if (m_enumeratorS.MoveNext())

      {

        if (!m_enumeratorT.MoveNext())

        {

          m_enumeratorT = GetTEnumerator();

          if (!m_enumeratorT.MoveNext())

            return false;

        }



        return true;

      }



      return false;

    }



    private void Initialize()

    {

      m_enumeratorT = GetTEnumerator();

      m_enumeratorS = GetSEnumerator();

      CurrentMover = FirstMover;

      m_isInitilized = true;

    }



    public bool MoveNext()

    {

      if (!m_isInitilized)

      {

        Initialize();

      }

      return CurrentMover();

    }



    public void Reset()

    {

      m_isInitilized = false;

      m_secondReloaded = false;

      CurrentMover = null;

      m_enumeratorT = null;

      m_enumeratorS = null;

      DoDispose();

    }

  }

I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-) — 11 hours ago
I think the word unconventional describes them pretty good... although ts and ss were below that level ;-] — 11 hours ago
I can explain that ;-P It's local and in a very small scope, this is allowed in my world, tt and ss on the other hand were public. — 11 hours ago

搜尋此網誌

Btukfyl

Custom implementation of the linq Zip operator for different length lists

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Popular posts from this blog

A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

Calculate evaluation metrics using cross_val_predict sklearn

Insert data from modal to MySQL (multiple modal on website)

Custom implementation of the linq Zip operator for different length lists

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

Calculate evaluation metrics using cross_val_predict sklearn

Insert data from modal to MySQL (multiple modal on website)

3 Answers
3

3 Answers
3

3 Answers
3