Tuesday 26 June 2012

ListWeaver - WINNER

The winning code is shown below:

using System; 
using System.Collections.Generic; 
using System.Linq; 

namespace Orcomp 
{ 
    public class ListWeaver<T>
    { 
        public ListWeaver() 
        { 
            Yarns = new List<IList<T>>(); 
        } 

        public List<IList<T>> Yarns { get; set; }

        public bool CanWeave(IList<T> yarn)
        { 
            if (yarn == null)
                return false;

            List<IList<T>> yarns = Yarns.ToList(); 
            yarns.Add(yarn); 

            if (Walk(yarns) != null)
                return true;
            else 
                return false;
        } 

        public List<T> Weave() 
        { 
            return Walk(Yarns); 
        } 

        private List<T> Walk(List<IList<T>> yarns)
        { 
            var edges = new Dictionary<T, HashSet<T>>(); 
            foreach (var yarn in yarns)
            { 
                for (int i = 1; i < yarn.Count; i++)
                { 
                    if (!edges.ContainsKey(yarn[i])) 
                        edges[yarn[i]] = new HashSet<T>(); 
                    edges[yarn[i]].Add(yarn[i - 1]); 
                } 
            } 

            var path = new List<T>(edges.Count); 
            var seen = new HashSet<T>(); 

            var valid = yarns.Where(yarn => yarn.Count > 0).Select(yarn => yarn.Last()).All(node => Walk(node, path, edges, seen));
            if (valid) 
                return path; 
            else 
                return null;
        } 

        private bool Walk(T root, List<T> path, Dictionary<T, HashSet<T>> edges, HashSet<T> seen_all)
        { 
            var seen_now = new HashSet<T>(); 
            var to_visit = new LinkedList<KeyValuePair<T, bool>>(); 
            to_visit.AddLast(new KeyValuePair<T, bool>(root, false)); 

            while (to_visit.Count > 0)
            { 
                var last = to_visit.Last.Value; 
                var node = last.Key; 
                var drop = last.Value; 
                to_visit.RemoveLast(); 

                if (drop) 
                { 
                    path.Add(node); 
                    seen_now.Remove(node); 
                    continue; 
                } 

                if (seen_now.Contains(node)) 
                    return false;

                if (seen_all.Contains(node)) 
                    continue; 

                seen_all.Add(node); 
                seen_now.Add(node); 

                to_visit.AddLast(new KeyValuePair<T, bool>(node, true)); 
                if (edges.ContainsKey(node)) 
                    foreach (var edge in edges[node].Reverse())
                        to_visit.AddLast(new KeyValuePair<T, bool>(edge, false)); 
            } 

            return true; 
        } 
    } 
}

Feedback and Improvement:

The winning code runs efficiently in O(V+E) and makes very good use of HashSets and Dictionaries. (i.e. the time it takes to solve all benchmark tests is the same whether the lists are reversed or not, and the last two highly connected graphs solve efficiently in the same amount of time.) Well Done!

There is actually not much I can think of to improve on the code, however I would like to introduce another approach and start a new competition  for anyone who can solve this problem efficiently using an adjacency matrix representation of the graph using sparse matrices.

Some preliminary work suggests that if the adjacency matrix can be transformed into an upper triangular matrix a solution exists to the ListWeaver problem.

The following wikipedia page is a good starting point:
Wikipeida - Adjacency matrix

You are allowed to use 3rd party open source libraries to represent the sparse matrices, if required.


22 comments:

  1. Hello,

    Please have a look at the following test case (I can probably make it smaller if I have a closer look):

    [TestMethod]
    public void CanWeave_CollectionOfLists_ReturnsCorrectSequenceRdW()
    {
    // Arrange
    var lw = new ListWeaver();
    var list1 = new List { "Blue" };
    var list2 = new List { "Bronze", "Yellow", "Purple" };
    var list3 = new List { "Black", "Orange" };
    var list4 = new List { "Red", "Blue", "Gold" };
    var list5 = new List { "Silver", "Blue" };
    var list6 = new List { "Bronze", "Blue" };

    // Act
    lw.Yarns.Add(list1);
    lw.Yarns.Add(list2);
    lw.Yarns.Add(list3);
    lw.Yarns.Add(list4);
    lw.Yarns.Add(list5);
    lw.Yarns.Add(list6);
    var result = lw.Weave();

    var correct = new List { "Bronze", "Red", "Silver", "Blue", "Yellow", "Purple", "Black", "Orange", "Gold" };

    // Assert
    Assert.Equal(correct, result );
    }

    The competition winner returns
    { "Red", "Silver", "Bronze", "Blue", "Yellow", "Purple", "Black", "Orange", "Gold" }

    Blue is in the same location. I think Bronze should come first in the result (instead of Red), because it precedes blue in list 6 and is introduced in list2 (Red is introduced in list4).

    If the competition winner is correct, can you please explain why?

    Cheers,
    Renze.

    ReplyDelete
    Replies
    1. The result should be: Bronze, Yellow, Purple, Black, Orange, Red, Silver, Blue, Gold

      Delete
    2. Now I am confused.

      Your answer to the question convinced me that my interpretation of the problem (described in another comment as a way to determine which solution is best) was correct. The result that you present here completely shatters that notion.

      Blue is the first entry in list1, causing it to have a high ranking in the order in which the lists are processed. By necessity (I think because of what you call "natural order", only Red, Silver and Bronze need to be before it.

      However, "Blue" is put near the end of the list.

      If the result you indicate is correct, could you please give a more formal/complete description of the required result?

      Cheers,
      Renze.

      Delete
    3. Hi Renze,

      I have had more time to look at your solution and you are correct the solution should be:

      Bronze, Red, Silver, Blue, Yellow, Purple, Black, Orange, Gold

      Delete
  2. Hello,

    I found a simpler test case for which the competition winner generates a result that is not expected by me (or rather, I wrote a program that generated a simpler test case).

    Input:
    Purple
    Green
    Orange,Purple
    Green,Purple

    The competition winner produces Orange,Green,Purple.

    I think it should produce Green, Orange, Purple. Can you explain what is correct and why?

    Thank you,
    renze.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Yes, my test gives the same results "Green, Orange, Purple". If we treat the input data as "Green goes before Orange unless clearly stated vice-versa somewhere in the lower lists" then there is no other option.

    On the other hand we have an ambiguity here. For example we can say, that "Green comes before Orange, because it is in the higher priority list than Orange" and looks like we'are right. But we can also say that "Despite Green comes in the higher priority list than Orange, Green is connected to Purple *AFTER* Orange is connected to Purple, so Green should be after Orange in the result set".

    However it looks like such ambiguity is not resolved by the problem statement. Following "vertex priority" or "edge priority" we'll come to completely different algorithms. The fact that test cases are hidden only worsen the situation.

    I propose Ben should revise the problem statement, especially because he's just posted the variation of List Weaver contest

    ReplyDelete
  5. evgeny777 is correct in his assessment: "Green comes before Orange, because it is in the higher priority list than Orange"

    The answer should be "Green, Orange, Purple".

    Please note that the winner was selected based on the unit tests that were made available with the benchmarks. The competition statement did ask for people to submit edge case tests. Some people did provide extra tests, which were also added to the competition unit tests. At the time the selection was made the winning code passed all unit tests successfully.

    ReplyDelete
    Replies
    1. Hello,

      I did not (and do not) intend to dispute who won or should have won. Looking at the winning code I figured that it was very unlikely that the result would be in line with my understanding of the problem. Because of that I put the winner and my solution side by side, wrote a program that generated inputs randomly, so the results of both solutions could be compared. The goal of this was to find the difference in interpretation to gain a better understanding of the problem.

      You now have a number of solutions that pass the unit tests. I see that you have added some examples in the latest competition, partly based on the examples in the current discussion.

      I think this development is very interesting. During software development, we base unit tests for our code partly on the problem description. However, implementation details influence what are considered to be edge case tests. Because of that, we add new unit tests once the implementation is completed (sometimes based on code coverage). When we find bugs that were not anticipated, we add unit tests as well to prevent problems from reoccuring.

      Using the same method as I did to compare two solutions, it is possible to compare all available solutions. Every solution is based on a programmer's understanding of the problem. Those that have the same understanding of the problem may have different implementations with different problem areas. By comparing the results for randomly generated test cases you can find interesting test cases. It is possible to classify test cases (one subset of the solutions passes the test case, other subsets produce the same incorrect results for the test case). This classification can be used to generate a relatively small set of test case candidates that can be inspected individually for usefulness.

      Just a thought...
      Cheers,
      Renze.

      Delete
    2. Hi Renze,

      Good comment. Your paragraph on the unit tests is absolutely accurate. Sometimes we only become aware of edge cases once the solution has been implemented.

      As to the future, we are looking at automating the whole process. Competitors will be able to submit their solutions online and see if they pass all the unit tests. If all the unit tests pass then a benchmark result will be given and the entry will be ranked against other solutions.

      We could also add some functionality for users to submit extra unit tests as well.

      This will take a bit of time to implement but if anyone is interested in collaborating on this, please send me a note from the "contact us" page.

      Delete
  6. If so then it is quite strange to select winner based on unit tests which are kept in secret. The first thing that should always be checked is algorithm *correctness*, code length and execution time should come after that. And in your previous post you actually state that winner algorithm is *NOT* correct, because in some cases it produces wrong results. I do not want to dispute the results and take away somebodys victory, but in case you continue selecting the winner according to your hidden unit tests and nothing else, you might soon find nobody taking place in your contests.

    ReplyDelete
    Replies
    1. There are a few things I would like to address:
      1) All the unit tests were made available, even the secret ones, when I published the benchmarks on the 2012.06.18. (Please see the "Update" section of the competition page.)
      2) On the bottom of the competition page I wrote: "You can also submit your own unit tests, which will be used to test other entries. Which means if you know your code can handle a tricky situation, then you can submit a test to eliminate other entries." Very few people submitted extra unit tests. But I hope everyone can realise now why this important!
      3) The winning code passed all the unit tests at the time the competition closed. With the information I had available at the time I made the best decision I could.
      4) We try hard to be as fair as possible with these competitions and as stated in the README section of the home page, there is always $100 for grabs for the person who can improve on an algorithm and this also applies here (as well as all past competitions.). So this is what I propose. I will re-open the ListWeaver competition for a week. You will be able to submit new entries and unit tests. The winner will be selected based on passing all the unit tests that have been submitted and also has the best benchmark results.

      If you guys have other suggestions please let me know.

      Delete
    2. I see you mentioned about updating the test cases and version. It there a way to subscribe to those changes, or do I have check the blog myself.

      Delete
    3. The confusion is because all of you are trying to merge impossible lists. By impossible I mean that the list(s) cannot be merged because they cannot preserve natural order of all the elements.

      Please look at all the examples that you are discussing, there is a precedence inversion while merging a list towards the end. So there cannot be a combination that will preserve natural-order. You are all trying to retro-fit the list, which cannot be weaved!

      Renze is now talking correctly about natural-order, I had mentioned same thing about natural order in the original post. I had suggested to correct and invalid test case which also merges impossible lists. I was held incorrect and and my reasoning turned down. I had also submitted a corrected test case.

      evengy777 is right and correctness should be the first criteria. However, one can only think of limited test cases and scenarios, and once your mind gets convinced about a solution, it gets stuck into a local-minima and it cannot look up the hill for the correct solution. So automated validation is the way to go.

      For such competitions, the best approach would be to have first up a code that checks the validity of the result in an automated manner. In this case, we should have automated code that checks the validity of merged lists, i.e.
      [0] Each source list has an item only once. (To catch mischevious "Red","Blue",Red"....).
      [1] All the elements in source lists are included exactly once in the merged list.
      [2] For each item in the merged list, check that first-appearance in source lists is after the first-appearance of all its preceding items.

      Suggestions:

      [1] Get the validity code in place. It's not difficult, It required I'll do it for you.
      [2] Fix all the test cases by adding the validity check at the end.

      Congratulations to the original winner bawr, I believe his was the best solution under the given scenario. Congratulations to Ben for hosting the competition and encouraging this creative thinking.

      I welcome your decision of extending the ListWeaver competition. I believe my submission was the best algorithm (as per my local-minima). I spent good amount of time for that. And I feel cheated because of ambiguity in the problem/solution.

      Delete
    4. I wouldn't say they are "impossible" as such. Here's a restatement of the original problem that is perhaps clearer:
      a) If element A appears before element B in any single given list - then A *must* appear before B in the resulting list.
      b) If element A appears before element B in a list made by joining all given lists and removing duplicates after the first occurence of every element (in short, if A was encountered before B in the whole problem) - then A *should* (i.e. it must, if possible) appear before B in the resulting list.
      c) If there's a sequence of such "first occurences" like [ .. < A < .. < B < .. C < .. ], placing A before B has higher priority than placing B before C.

      So there's a mix of strong requirements (a) and weak requirements (b & c), added to make the solution unique.

      Delete
    5. Rajesh,

      In all examples that I have seen so far, my interpretation of "natural order" has not changed. Even though the result that has been given by orc (Ben?) does not match my expectations, the mismatch is in the priority, not in the natural order.

      Please let me give you my interpretation of the problem to get you out of your local optimum. Again, my interpretation is not authorative, it is simply my interpretation.

      For any given combination of input lists I(1) to I(N), a set of lists (RESULTSET) respects the natural ordering criterium. The natural ordering criterium means that:
      - for all results R in RESULTSET:
      - for all lists I(i) (1<=i<=N):
      R reduced to the elements in I(i) will produce I(i) (in the same order)

      The result of the weave of I(1) to I(N) should be the best result according to the comparison that I indicated in another post.

      An example:
      I(1) = red
      I(2) = green, blue

      With this input, the RESULTSET would be { {red, green, blue}, {green, red, blue}, {green, blue, red} }. Using the comparison that I mentioned earlier, only {red, green, blue} remains.

      What you keep trying to do in your questions is to reduce RESULTSET to a single solution before having considered all input lists. That may lead you to the conclusion that it is not posible to weave.

      Expanding the example by adding
      I(3) = green, red
      will reduce the RESULTSET of the earlier example to { {green, red, blue}, {green, blue, red} }.

      Applying the comparison to find the best result now will result in {green, red, blue}

      So, the natural ordering will deliver a set of possibilities, the list priority reduces the set of possibilities to one single answer. Please note that I am trying to specify the desired result. I am not suggesting this as an algorithm to solve the problem (it would definitely not be a winner).

      The result that orc says is the correct result for my test case is in RESULTSET, but it is the first example that I have seen where the result is not the best result according to that ordering. Because of this discrepancy, I have asked to reconsider correctness of the result and to explain why this is the correct result.

      Delete
    6. orc (Ben),

      I fully agree with points 1, 3, and 4.

      Regarding point 2, referring to my other (somewhat lengthy) comment about unit tests:

      The problem statement itself is not enough to provide unit tests that will weed out all entries that do not work in all cases. My random test case generator has generated millions of tests, of which less than 5% caused a different result than the contest winner.

      Only looking at the problem statement, I would not consider the test case tricky:

      Input:
      Purple
      Green
      Orange,Purple
      Green,Purple

      Considering the topological sort, only {Green, Orange, Purple} and {Orange, Green, Purple} are candidates. What are the odds that another implementation would choose the incorrect one?

      For every one of these millions of test cases that were generated, I would be able to write a program that satisfies all other test cases in the bench mark, but not that particular test case.

      Would you appreciate a million extra test cases? Two million? I don't think they will make it into the test set...

      I would consider it an improvement if the entries for the competition can be compared somehow, so that intereseting entries can be found. Two ways to accomplish this are:
      1) you look at the entries to derive new test cases.
      2) the existing entries are made available to the others. To avoid reverse engineering, that would have to be in the form of (for instance) an obfuscated release mode dll.

      Delete
    7. @Renze, Bawr: I understand what you are trying to mention. My reasoning is based on the original requirment mentioned by ORC:

      [1] Sort a collection of lists respecting their "natural order", as well as the order in which lists are processed. (i.e. elements in the first list take precedence over elements in the second list etc...)

      Please note the second part about "list ordering". So in your example { {Red}, {Green, Blue}, {Green,Red} }. As Red appears in the first list, it has to the first element in the merged list. Your result of {Green, Red, Blue} voilates this criteria.

      [2] ORC in the requirement had also mentioned

      The code would be used like:
      if(listWeaver.CanWeave(newList))
      {
      listWeaver.Yarns.Add(newList);
      }

      So the third list {Green, Red} should return false for canWeave (due to ambiguity of "list-ordering"), after adding the first two.

      I agree now that the "list ordering" looks like a secondary requirement, but can it be sacrificed for first is debatable. I think Renze's definition of natural ordering is crisp and should be used for validation.

      Delete
    8. Sorry guys,I don't have a lot of time to answer all the questions at this stage. I will be back in action on Wednesday. Just quickly I will try and restate the problem.

      The competition is fundamentally a topological sort. The only problem with a topological sort is that multiple answers are possible given a number of lists. So in order to only have one answer the concept of following the order in which the yarns are added to the class was introduced. This is only a "soft" constraint.

      In other words we need to make sure that a topological sort exists for all the lists and then only return the answer that respects "as much as possible" the order in which the lists were introduced. i.e. *If possible* return elements in list 1 before those in list 2 etc.... (The emphasis is on the "If possible" term.)

      Hope this makes more sense.

      Delete
    9. In an earlier comment I mentioned that I was confused about the result that orc told me was the correct result for the test case that started this thread (now Example 2 in the adjacency matrix competition).

      The ordering that I used to reduce the RESULTSET (as mentioned in my earlier post) produced the exact same result as the ordering orc requires for all test cases and examples that were mentioned before. Even for the reduced test case (the second test case in this thread, example 1 in the adjacency matrix competition) it produced the same results.

      The ordering that I used also respects "as much as possible" the order in which the lists were introduced. I posted this ordering earlier in another comment.

      I think that I have found another interpretation that matches all examples and test cases that have been posted up until now. I have no way of knowing if it is correct. It is good until the next example comes along that shatters it.

      The point I want to make is that the description of the desired result leaves room for interpretation. It should be more precise, so that the examples are a clarification of the requirements. They should not be part of the requirements.

      If it is difficult to specify the required result more precisely, the next best thing may be to provide a reference implementation. This reference implementation should produce correct output, but not necessarily in the most efficient way.

      With a reference implementation, programmers can find test cases where their solution deviates from the required solution. It is even possible to add a random test case generator to the test set (providing a fixed seed to the random generator causes the test case to always provide the same inputs) which compares the "solution under test" outputs with the reference implementation outputs, thus reducing the possibility that another test case will pop up the next time a winner has been chosen.

      Delete
    10. Renze,

      Your interpretation is correct. My initial response was wrong. Sorry for the confusion.

      Delete
  7. The winning code - it's mine, by the way - passed all the tests that were available at the time the contest closed. This is *including* Ben's tests (which had already been made public at that time), and any *additional* tests submitted by other contestants.

    Now, I can see why it's not quite correct in the edge cases you provided (there's a simple fix, by the way, perhaps Ben wants to offer a prize for that to someone?), but again - these additional cases were provided *after* the code has been selected as the winning entry.

    If this had happened earlier, I would not have won - or rather, I would have had one more chance to fix my code, since I only used two out of three available entries.

    Best regards and best of luck on future entries!

    ReplyDelete