Ranks Of, Ordered Ranks Of and Tied Ranks Of (Statistics in Erlang part 4)

10:09 pm Erlang, Math, Programming, Statistics, Tools and Libraries

[digg-reddit-me]The upcoming Statistics in Erlang post, on correlations, requires some groundwork tools.  At this time, I don’t actually know the proper statistical names for these functions, and in fact such names probably don’t exist, so I just made some up; these functions are treated by the texts describing correlations as an inline part of the process, but the process is much easier to understand if they’re seperate, so I seperated them.

Witness ranks_of(), ordered_ranks_of() and tied_ranks_of().  On their own, their importance isn’t tremendously apparent, but when we get to correlations, it’ll suddenly become obvious.

  • ranks_of()
    • Ranks of takes a list and reverse sorts it, then returns a series of 2-tuples whose first member is the “rank” or 1-offset ordinal of the item’s list position, and whose second member is the original list value.  Erlang sort is used to order the terms, so they don’t actually need to be strictly numeric, though the function is pretty meaningless otherwise.  Soon, a version will be provided that takes a predicate for custom sorting.
    • Tied (equal) values are listed in order as separate ranks.  If you want those ties to be expressed as average values, use tied_ranks_of() below.  Because there are no ties, ranks are expressed as integers.
    • Values are presented with highest rank (rank 1, highest literal value) sorted to the front.
    • 1> scutil:ranks_of([3,8,22,8,535]).
      [{1,535},{2,22},{3,8},{4,8},{5,3}]
  • tied_ranks_of()
    • Very similar to ranks_of(), except that ranks are presented as tied when values are equivalent.  Notice how with ranks_of(), the two 8s are ranked 3 and 4; here, they are both ranked 3.5.
    • Because ties can create half-values, ranks are presented as floats.
    • 86> scutil:tied_ranks_of([3,8,22,8,535]).
      [{1.00000,535},{2.00000,22},{3.50000,8},{3.50000,8},{5.00000,3}]
  • ordered_ranks_of()
    • Correlations frequently need to be column cross-sorted.  The easiest way to deal with this is to provide a ranking function which provides the rankings in the column’s original order.  ordered_ranks_of() is tied_ranks_of() which does not alter list ordering.
    • 3> scutil:ordered_ranks_of([3,8,22,8,535]).
      [{5.00000,3},{3.50000,8},{2.00000,22},{3.50000,8},{1.00000,535}]

These functions are important building blocks to the correlations, coming up in parts 5 and 6.

As usual, this code is part of the ScUtil library. ScUtil is free and MIT license, because the GPL is evil.

This code doesn’t need any of the prior parts 1, 2, or 3 of Statistics in Erlang, but I’m linking them to make them easy to find.

This closes issue 129.

Sorry about the screwy formatting; it’s hard fitting code into a blog.  The formatting in the module is much nicer.

ranks_of(List) when is_list(List) ->
    lists:zip(lists:seq(1,length(List)),lists:reverse(lists:sort(List))).

tied_ranks_of(List) ->
    tied_rank_worker(ranks_of(List), [], no_prev_value).

tied_add_prev(Work, {FoundAt, NewValue}) ->
    lists:duplicate(
        length(FoundAt),
        {lists:sum(FoundAt) / length(FoundAt), NewValue}
    ) ++ Work.

tied_rank_worker([], Work, PrevValue) ->
    lists:reverse(tied_add_prev(Work, PrevValue));

tied_rank_worker([Item|Remainder], Work, PrevValue) ->
    case PrevValue of
        no_prev_value ->
            {BaseRank,BaseVal} = Item,
            tied_rank_worker(Remainder, Work, {[BaseRank],BaseVal});
        {FoundAt,OldVal} ->
            case Item of
                {Id,OldVal} ->
                    tied_rank_worker(
                        Remainder,
                        Work,
                        {[Id]++FoundAt,OldVal});
                {Id,NewVal} ->
                    tied_rank_worker(Remainder,
                        tied_add_prev(Work, PrevValue),
                        {[Id],NewVal})
            end
    end.

ordered_ranks_of(List) when is_list(List) ->
    ordered_ranks_of(List, tied_ranks_of(List), []).

ordered_ranks_of([], [], Work) ->
    lists:reverse(Work);

ordered_ranks_of([Front|Rem], Ranks, Work) ->
    {value,Item} = lists:keysearch(Front,2,Ranks),
    {IRank,Front} = Item,
    ordered_ranks_of(Rem, Ranks--[Item], [{IRank,Front}]++Work).

8 Responses

  1. vat Says:

    in tied_add_prev, you’re calling length multiple times unnecessarily. if FoundAt is really long, then this could be a very expensive operation.

  2. John Haugeland Says:

    Good point. I’ll fix that this weekend when I add in Anders Nygren’s replacement code and Alain O’Dea’s suggestions.

  3. Anders Nygren Says:

    This only traverses the list 2 times in stead of four.

    ranks_of(List) when is_list(List) ->
    {L,_} = lists:foldl(fun (E,{L,S}) ->
    {[{S,E}|L],S+1}
    end, {[],1}, lists:sort(fun erlang:'>' /2, List)),
    lists:reverse(L).

  4. Anders Nygren Says:

    After actually timing my previous code I found that lists:sort/2 is very slow.
    In this case it is actually better to do lists:reverse(lists:sort(List))

  5. John Haugeland Says:

    Yeah, sorts in typable languages are generally o(lg n) with a large constant, whereas lists:reverse is just a set of pointer swaps, which most CPUs have atomic ops for; avoiding sort where possible is awesome.

    Still, I can probably take some traversals out this weekend, when I get down to actually thinking about it. :D

  6. Statistical Correlations (Statistics in Erlang part 5) | Full of BS Says:

    [...] way this is achieved is through ranking, which was covered in Statistics in Erlang part 4.  The general idea is straightforward: just make a list of your values’ ranks from most [...]

  7. Spearman’s Rank Correlation Coefficient (Statistics in Erlang part 7) | Full of BS Says:

    [...] is free and MIT licensed, because the GPL is evil.  This code uses the ordered ranks code from Part 4.  This closes issue [...]

  8. Recent Links Tagged With "correlations" - JabberTags Says:

    [...] How Correlation Between Asset Classes Affects Your Portfolio Saved by OGDIABLO on Tue 11-11-2008 Ranks Of, Ordered Ranks Of and Tied Ranks Of (Statistics in Erlang … Saved by socialmediamarketing on Mon 10-11-2008 Physicists Seek Answers to Quantum Correlations [...]

Leave a Comment

Your comment

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.