Ranks Of, Ordered Ranks Of and Tied Ranks Of (Statistics in Erlang part 4)
August 14, 2008 10:09 pm Erlang, Math, Programming, Statistics, Tools and Libraries[digg-reddit-me]The upcoming Statistics in Erlang post, on correlations, requires some groundwork tools. At this time, I don’t actually know the proper statistical names for these functions, and in fact such names probably don’t exist, so I just made some up; these functions are treated by the texts describing correlations as an inline part of the process, but the process is much easier to understand if they’re seperate, so I seperated them.
Witness ranks_of(), ordered_ranks_of() and tied_ranks_of(). On their own, their importance isn’t tremendously apparent, but when we get to correlations, it’ll suddenly become obvious.
- ranks_of()
- Ranks of takes a list and reverse sorts it, then returns a series of 2-tuples whose first member is the “rank” or 1-offset ordinal of the item’s list position, and whose second member is the original list value. Erlang sort is used to order the terms, so they don’t actually need to be strictly numeric, though the function is pretty meaningless otherwise. Soon, a version will be provided that takes a predicate for custom sorting.
- Tied (equal) values are listed in order as separate ranks. If you want those ties to be expressed as average values, use tied_ranks_of() below. Because there are no ties, ranks are expressed as integers.
- Values are presented with highest rank (rank 1, highest literal value) sorted to the front.
- 1> scutil:ranks_of([3,8,22,8,535]).
[{1,535},{2,22},{3,8},{4,8},{5,3}]
- tied_ranks_of()
- Very similar to ranks_of(), except that ranks are presented as tied when values are equivalent. Notice how with ranks_of(), the two 8s are ranked 3 and 4; here, they are both ranked 3.5.
- Because ties can create half-values, ranks are presented as floats.
- 86> scutil:tied_ranks_of([3,8,22,8,535]).
[{1.00000,535},{2.00000,22},{3.50000,8},{3.50000,8},{5.00000,3}]
- ordered_ranks_of()
- Correlations frequently need to be column cross-sorted. The easiest way to deal with this is to provide a ranking function which provides the rankings in the column’s original order. ordered_ranks_of() is tied_ranks_of() which does not alter list ordering.
- 3> scutil:ordered_ranks_of([3,8,22,8,535]).
[{5.00000,3},{3.50000,8},{2.00000,22},{3.50000,8},{1.00000,535}]
These functions are important building blocks to the correlations, coming up in parts 5 and 6.
As usual, this code is part of the ScUtil library. ScUtil is free and MIT license, because the GPL is evil.
This code doesn’t need any of the prior parts 1, 2, or 3 of Statistics in Erlang, but I’m linking them to make them easy to find.
This closes issue 129.
Sorry about the screwy formatting; it’s hard fitting code into a blog. The formatting in the module is much nicer.
ranks_of(List) when is_list(List) ->
lists:zip(lists:seq(1,length(List)),lists:reverse(lists:sort(List))).
tied_ranks_of(List) ->
tied_rank_worker(ranks_of(List), [], no_prev_value).
tied_add_prev(Work, {FoundAt, NewValue}) ->
lists:duplicate(
length(FoundAt),
{lists:sum(FoundAt) / length(FoundAt), NewValue}
) ++ Work.
tied_rank_worker([], Work, PrevValue) ->
lists:reverse(tied_add_prev(Work, PrevValue));
tied_rank_worker([Item|Remainder], Work, PrevValue) ->
case PrevValue of
no_prev_value ->
{BaseRank,BaseVal} = Item,
tied_rank_worker(Remainder, Work, {[BaseRank],BaseVal});
{FoundAt,OldVal} ->
case Item of
{Id,OldVal} ->
tied_rank_worker(
Remainder,
Work,
{[Id]++FoundAt,OldVal});
{Id,NewVal} ->
tied_rank_worker(Remainder,
tied_add_prev(Work, PrevValue),
{[Id],NewVal})
end
end.
ordered_ranks_of(List) when is_list(List) ->
ordered_ranks_of(List, tied_ranks_of(List), []).
ordered_ranks_of([], [], Work) ->
lists:reverse(Work);
ordered_ranks_of([Front|Rem], Ranks, Work) ->
{value,Item} = lists:keysearch(Front,2,Ranks),
{IRank,Front} = Item,
ordered_ranks_of(Rem, Ranks--[Item], [{IRank,Front}]++Work).

August 14th, 2008 at 10:22 pm
in tied_add_prev, you’re calling length multiple times unnecessarily. if FoundAt is really long, then this could be a very expensive operation.
August 14th, 2008 at 10:29 pm
Good point. I’ll fix that this weekend when I add in Anders Nygren’s replacement code and Alain O’Dea’s suggestions.
August 15th, 2008 at 4:27 am
This only traverses the list 2 times in stead of four.
ranks_of(List) when is_list(List) ->
{L,_} = lists:foldl(fun (E,{L,S}) ->
{[{S,E}|L],S+1}
end, {[],1}, lists:sort(fun erlang:'>' /2, List)),
lists:reverse(L).
August 15th, 2008 at 7:11 am
After actually timing my previous code I found that lists:sort/2 is very slow.
In this case it is actually better to do lists:reverse(lists:sort(List))
August 15th, 2008 at 9:27 am
Yeah, sorts in typable languages are generally o(lg n) with a large constant, whereas lists:reverse is just a set of pointer swaps, which most CPUs have atomic ops for; avoiding sort where possible is awesome.
Still, I can probably take some traversals out this weekend, when I get down to actually thinking about it.
August 23rd, 2008 at 4:45 pm
[...] way this is achieved is through ranking, which was covered in Statistics in Erlang part 4. The general idea is straightforward: just make a list of your values’ ranks from most [...]
August 24th, 2008 at 10:36 am
[...] is free and MIT licensed, because the GPL is evil. This code uses the ordered ranks code from Part 4. This closes issue [...]
November 30th, 2008 at 6:49 am
[...] How Correlation Between Asset Classes Affects Your Portfolio Saved by OGDIABLO on Tue 11-11-2008 Ranks Of, Ordered Ranks Of and Tied Ranks Of (Statistics in Erlang … Saved by socialmediamarketing on Mon 10-11-2008 Physicists Seek Answers to Quantum Correlations [...]