<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Full of BS &#187; Math</title>
	<atom:link href="http://fullof.bs/category/math/feed/" rel="self" type="application/rss+xml" />
	<link>http://fullof.bs</link>
	<description>He just never stops talking</description>
	<lastBuildDate>Mon, 19 Oct 2009 15:08:53 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Moments, Skewness and Kurtosis (Statistics in Erlang part 8)</title>
		<link>http://fullof.bs/moments-skewness-and-kurtosis-statistics-in-erlang-part-8/</link>
		<comments>http://fullof.bs/moments-skewness-and-kurtosis-statistics-in-erlang-part-8/#comments</comments>
		<pubDate>Thu, 28 Aug 2008 02:43:28 +0000</pubDate>
		<dc:creator>John Haugeland</dc:creator>
				<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Tools and Libraries]]></category>
		<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[central moment]]></category>
		<category><![CDATA[kurtosis]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[moment]]></category>
		<category><![CDATA[skewness]]></category>
		<category><![CDATA[stats]]></category>

		<guid isPermaLink="false">http://fullof.bs/?p=365</guid>
		<description><![CDATA[[digg-reddit-me]So, it was pointed out to me that I had the central moments, but not the moments &#8211; ie, the ones not normalized against the input&#8217;s average.  Also, it was pointed out that most people don&#8217;t know that kurtosis and skewness are related to the central moments.  Furthermore, it turns out (and I didn&#8217;t know [...]]]></description>
			<content:encoded><![CDATA[<p>[digg-reddit-me]So, it was pointed out to me that I had the central moments, but not the moments &#8211; ie, the ones not normalized against the input&#8217;s average.  Also, it was pointed out that most people don&#8217;t know that kurtosis and skewness are related to the central moments.  Furthermore, it turns out (and I didn&#8217;t know this) that there are in fact meaningful uses of floating-point exponents in moments.</p>
<p>So, I implemented moments, I replaced my central moments implementation, and I gave name wrappers for skewness and kurtosis to make them easier to identify.</p>
<p>This closes issues <a title="Crunchy Development issue 132, Skewness" href="http://crunchyd.com/forum/project.php?issueid=169" target="_blank">169</a>, <a title="Crunchy Development issue 170, Kurtosis" href="http://crunchyd.com/forum/project.php?issueid=170" target="_blank">170</a>, <a title="Crunchy Development issue 171, Moments" href="http://crunchyd.com/forum/project.php?issueid=171" target="_blank">171</a>, <a title="Crunchy Development issue 172, Real valued moments" href="http://crunchyd.com/forum/project.php?issueid=172" target="_blank">172</a> and <a title="Crunchy Development issue 173, Real valued central moments" href="http://crunchyd.com/forum/project.php?issueid=173" target="_blank">173</a>.  As usual, this code is part of <a title="The ScUtil Library" href="http://scutil.com/" target="_blank">the ScUtil library</a>, which is free and MIT licensed, because the GPL is evil.</p>
<pre style="padding-left: 30px">moment(List, N) when is_list(List), is_number(N) -&gt;
    scutil:arithmetic_mean( [ pow(Item, N) || Item &lt;- List ] ).

moments(List) -&gt;
    moments(List, [2,3,4]).

moments(List, Moments) when is_list(Moments) -&gt;
    [ moment(List, M) || M &lt;- Moments ].

central_moment(List, N) when is_list(List), is_number(N) -&gt;
    ListAMean = scutil:arithmetic_mean(List),
    scutil:arithmetic_mean( [ pow(Item-ListAMean, N) || Item &lt;- List ] ).

central_moments(List) -&gt;
    central_moments(List, [2,3,4]).

central_moments(List, Moments) when is_list(Moments) -&gt;
    [ central_moment(List, M) || M &lt;- Moments ].

skewness(List) -&gt; central_moment(List, 3).
kurtosis(List) -&gt; central_moment(List, 4).</pre>
]]></content:encoded>
			<wfw:commentRss>http://fullof.bs/moments-skewness-and-kurtosis-statistics-in-erlang-part-8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spearman&#8217;s Rank Correlation Coefficient (Statistics in Erlang part 7)</title>
		<link>http://fullof.bs/spearmans-rank-correlation-coefficient-statistics-in-erlang-part-7/</link>
		<comments>http://fullof.bs/spearmans-rank-correlation-coefficient-statistics-in-erlang-part-7/#comments</comments>
		<pubDate>Sun, 24 Aug 2008 16:29:33 +0000</pubDate>
		<dc:creator>John Haugeland</dc:creator>
				<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Tools and Libraries]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[mit license]]></category>
		<category><![CDATA[spearman]]></category>
		<category><![CDATA[spearman correlation]]></category>
		<category><![CDATA[spearman rank]]></category>
		<category><![CDATA[spearman rho]]></category>
		<category><![CDATA[spearman's rank correlation coefficient]]></category>
		<category><![CDATA[stats]]></category>

		<guid isPermaLink="false">http://fullof.bs/?p=353</guid>
		<description><![CDATA[[digg-reddit-me]Spearman&#8217;s Rank Correlation Coefficient &#8211; usually just called The Spearman Correlation, sometimes Spearman&#8217;s Rank or Spearman&#8217;s Rho &#8211; is a method of determining the similarity between two numeric sets (we also go over the Pearson and the Kendall).  If you don&#8217;t know what a correlation is, start with Statistics in Erlang part 5, which covers [...]]]></description>
			<content:encoded><![CDATA[<p>[digg-reddit-me]Spearman&#8217;s Rank Correlation Coefficient &#8211; usually just called The Spearman Correlation, sometimes Spearman&#8217;s Rank or Spearman&#8217;s Rho &#8211; is a method of determining the similarity between two numeric sets (we also go over the <a title="Statistics in Erlang part 6" href="http://fullof.bs/the-pearson-correlation-coefficient-statistics-in-erlang-part-6" target="_blank">Pearson</a> and the Kendall).  If you don&#8217;t know what a correlation is, start with <a title="Statistics in Erlang part 5" href="http://fullof.bs/statistical-correlations-statistics-in-erlang-part-5" target="_blank">Statistics in Erlang part 5</a>, which covers the basic idea of correlations.  A good math-based explanation is available at <a title="Hyperstat - Spearman's Rho" href="http://davidmlane.com/hyperstat/A62436.html" target="_blank">David M Lane&#8217;s Hyperstat</a>.</p>
<p>In English, for two lists of length N, you take the square of the difference between each matched row in the two lists, sum them, multiply by six, and divide by N cubed minus N.  This provides the rank correlation squared, which is over the interval [-1, 1].  However, with spearman you usually use the rank squared, so we leave it that way instead of providing the root; the tuple&#8217;s atom label makes it clear that&#8217;s happening.</p>
<p>Sorry again about the screwy formatting; I&#8217;m just getting things to fit on the blog.  The stuff in the library is better formatted.</p>
<pre style="padding-left: 30px">spearman_correlation(List1, List2) when
    is_list(List1), is_list(List2),
    length(List1) /= length(List2) -&gt; {error, lists_must_be_same_length};

spearman_correlation(List1, List2) when is_list(List1), is_list(List2) -&gt;

    {TR1,_} = lists:unzip(ordered_ranks_of(List1)),
    {TR2,_} = lists:unzip(ordered_ranks_of(List2)),

    Numerator   = 6 * lists:sum([ (D1-D2)*(D1-D2)
                               || {D1,D2} &lt;- lists:zip(TR1,TR2) ]),
    Denominator = math:pow(length(List1),3)-length(List1),

    {rsquared,1-(Numerator/Denominator)}.</pre>
<p>Test data is available at <a title="Geography Fieldwork - Spearman's Rank" href="http://geographyfieldwork.com/SpearmansRank.htm" target="_blank">Geography Fieldwork</a>.</p>
<pre style="padding-left: 30px"><span style="color: #008080"><strong><span style="color: #000080">1&gt; X = [50,175,270,375,425,580,710,790,890,980].</span>
[50,175,270,375,425,580,710,790,890,980]
<span style="color: #000080">2&gt; Y = [1.80,1.20,2.00,1.00,1.00,1.20,0.80,0.60,1.00,0.85].</span>
[1.80000, 1.20000, 2.00000, 1.00000, 1.00000,
 1.20000, 0.800000, 0.600000, 1.00000, 0.850000]
<span style="color: #000080">3&gt; scutil:spearman_correlation(X,Y).</span>
{rsquared,-0.730303}</strong></span></pre>
<p>This code is part of <a title="The ScUtil Library" href="http://scutil.com/" target="_blank">the ScUtil library</a>.  This code is free and MIT licensed, because the GPL is evil.  This code uses the ordered ranks code from <a title="Statistics in Erlang part 4" href="http://fullof.bs/ranks-of-ordered-ranks-of-and-tied-ranks-of-statistics-in-erlang-part-4" target="_blank">Part 4</a>.  This closes <a title="Crunchy Development issue 139, Spearman's Rank" href="http://crunchyd.com/forum/project.php?issueid=139" target="_blank">issue 139</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://fullof.bs/spearmans-rank-correlation-coefficient-statistics-in-erlang-part-7/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Pearson Correlation Coefficient (Statistics in Erlang part 6)</title>
		<link>http://fullof.bs/the-pearson-correlation-coefficient-statistics-in-erlang-part-6/</link>
		<comments>http://fullof.bs/the-pearson-correlation-coefficient-statistics-in-erlang-part-6/#comments</comments>
		<pubDate>Sun, 24 Aug 2008 01:46:39 +0000</pubDate>
		<dc:creator>John Haugeland</dc:creator>
				<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Tools and Libraries]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[correlation]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[pearson correlation]]></category>
		<category><![CDATA[pearson correlation coefficient]]></category>
		<category><![CDATA[stats]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://fullof.bs/?p=343</guid>
		<description><![CDATA[[digg-reddit-me]I went over much of the concept of correlations in Part 5; if you don&#8217;t know what statistical correlations are, you should read part 5 first.
The Pearson Correlation Coefficient is one method of generating the correlation of sets.  You can get a good math-based explanation at David Lane&#8217;s Hyperstat.
In english, basically, you take two numeric [...]]]></description>
			<content:encoded><![CDATA[<p>[digg-reddit-me]I went over much of the concept of correlations in <a title="Statistics in Erlang part 5" href="http://fullof.bs/statistical-correlations-statistics-in-erlang-part-5" target="_blank">Part 5</a>; if you don&#8217;t know what statistical correlations are, you should read part 5 first.</p>
<p>The Pearson Correlation Coefficient is one method of generating the correlation of sets.  You can get a good math-based explanation at <a title="Hyperstat - Computing Pearson's Correlation" href="http://davidmlane.com/hyperstat/A51911.html" target="_blank">David Lane&#8217;s Hyperstat</a>.</p>
<p>In english, basically, you take two numeric lists of the same length, X and Y, then calculate five sums and a length from them:</p>
<ol>
<li>The sum of the items in List X (SumX)</li>
<li>The sum of the items in List Y (SumY)</li>
<li>The sum of the squares of the items in List X (SumXX)</li>
<li>The sum of the squares of the items in List Y (SumYY)</li>
<li>The sum of the products of the matched items in Lists X and Y (SumXY)</li>
<li>The length of the lists, which should be the same (N)</li>
</ol>
<p>Using those, you can construct a polynomial which is honestly best expressed in code:</p>
<pre style="padding-left: 30px">pearson_correlation(List1, List2) when is_list(List1), is_list(List2) -&gt;

    SumXY = lists:sum([A*B || {A,B} &lt;- lists:zip(List1,List2) ]),   ]

    SumX  = lists:sum(List1),
    SumY  = lists:sum(List2),

    SumXX = lists:sum([L*L || L&lt;-List1]),
    SumYY = lists:sum([L*L || L&lt;-List2]),

    N     = length(List1),

    Numer = (N*SumXY) - (SumX * SumY),
    Denom = math:sqrt(((N*SumXX)-(SumX*SumX)) * ((N*SumYY)-(SumY*SumY))),

    {r, (Numer/Denom)}.</pre>
<p>This code is part of the <a title="The ScUtil Library" href="http://scutil.com/" target="_blank">ScUtil Library</a>.  The ScUtil library is free and MIT licensed, because the GPL is evil.</p>
<pre style="padding-left: 30px"><span style="color: #008080"><strong><span style="color: #000080">1&gt; X = [1,3,5,6,8,9,6,4,3,2].  </span>    </strong>
[1,3,5,6,8,9,6,4,3,2]
<strong><span style="color: #000080">2&gt; Y = [2,5,6,6,7,7,5,3,1,1].</span>  </strong>
[2,5,6,6,7,7,5,3,1,1]
<span style="color: #000080"><strong>3&gt; scutil:pearson_correlation(X,Y).</strong></span>
{r,0.854706}</span></pre>
<p>Verification of test data is available at <a title="Chaing Minds - Pearson Correlation" href="http://changingminds.org/explanations/research/analysis/pearson.htm" target="_blank">Changing Minds</a>.  This closes <a title="Crunchy Development issue 140, Pearson Correlation" href="http://crunchyd.com/forum/project.php?issueid=140" target="_blank">issue 140</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://fullof.bs/the-pearson-correlation-coefficient-statistics-in-erlang-part-6/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Statistical Correlations (Statistics in Erlang part 5)</title>
		<link>http://fullof.bs/statistical-correlations-statistics-in-erlang-part-5/</link>
		<comments>http://fullof.bs/statistical-correlations-statistics-in-erlang-part-5/#comments</comments>
		<pubDate>Sat, 23 Aug 2008 22:45:05 +0000</pubDate>
		<dc:creator>John Haugeland</dc:creator>
				<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Tools and Libraries]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[correlation]]></category>
		<category><![CDATA[kendall correlation]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[pearson correlation]]></category>
		<category><![CDATA[spearman correlation]]></category>
		<category><![CDATA[stats]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://fullof.bs/?p=326</guid>
		<description><![CDATA[[digg-reddit-me]Since most of my readers are programmers, I&#8217;m going to explain this in programmer-speak.  Also, it&#8217;s damned hard to find a non-math explanation of this stuff.
The general idea with correlations is simple: we want to measure how much changes in one set affect the other set &#8211; that is, their correlation.  Correlations aren&#8217;t about the [...]]]></description>
			<content:encoded><![CDATA[<p>[digg-reddit-me]Since most of my readers are programmers, I&#8217;m going to explain this in programmer-speak.  Also, it&#8217;s damned hard to find a non-math explanation of this stuff.</p>
<p>The general idea with correlations is simple: we want to measure how much changes in one set affect the other set &#8211; that is, their correlation.  Correlations aren&#8217;t about the actual values involved in the two columns, so much as how they seem to affect one another.</p>
<p>A simple example is the edge size and volume of a cube.  As the edge size goes up, so will the volume.  To that end, if you make a two-column table where one column is edge size and the other volume (or, for that matter, face size works too), and then the rows are just a bunch of example data, you would want to see a &#8220;perfect correlation&#8221; &#8211; without fail, the change in column 1 should show a perfect match for changes in column 2.  For a perfect match like that, you get a correlation of 1.0.  Similarly, if you measure the average density of a fixed number of particles in that space, as the edge size goes up, the average density goes down; you would see a &#8220;perfect inverse&#8221; correlation, or a value of -1.0.  If you measure two values which aren&#8217;t correlated &#8211; where values in one column don&#8217;t seem to affect values in the other &#8211; you should get a value at or near zero.</p>
<p>The purpose of the correlation coefficient is to tell how how strongly two columns are correlated, as well as whether their correlation is positive (similar) or negative (inverse).  You can use measurements to determine whether sets of measurements are related.</p>
<p>Consider, for example, a table of height and weight among a distribution of people.  One expects a strong correlation, but not perfect; some people are over- and under-weight for their height.  The closer that measurement comes to 1, the less the outside factors matter.  The closer that measurement comes to zero, the less dominant the measured term is in the measured result.  In practical terms, if you see (for example) a stronger correlation between users of Medicine X and outbreaks of Symptom Y than in the general population, it is likely that Medicine X has Symptom Y as a long-term ramification.</p>
<p>The way this is achieved is through ranking, which was covered in <a title="Statistics in Erlang part 4" href="http://fullof.bs/ranks-of-ordered-ranks-of-and-tied-ranks-of-statistics-in-erlang-part-4" target="_blank">Statistics in Erlang part 4</a>.  The general idea is straightforward: just make a list of your values&#8217; ranks from most significant (usually largest) to least, starting counting at 1.  Do that for both columns, then sort by the first column (keep the columns correlated of course).  At that point, what you actually do to measure the correlation varies from method to method, but the general landscape of things should now be apparent: we&#8217;re just measuring how much the difference in rank varies when sorted by one column.</p>
<p>There are several ways to get such correlations.  We&#8217;re going to go over the big three &#8211; the Pearson Correlation Coefficient, the Kendall Tau Rank Correlation, and the Spearman Rank Correlation Coefficient.  Each one is covered in one of the upcoming tutorials: <a title="Statistics in Erlang part 6" href="http://fullof.bs/the-pearson-correlation-coefficient-statistics-in-erlang-part-6" target="_blank">Pearson Correlation in Erlang</a> (part 6), Spearman Correlation in Erlang (part 7) and Kendall Correlation in Erlang (part 8).</p>
]]></content:encoded>
			<wfw:commentRss>http://fullof.bs/statistical-correlations-statistics-in-erlang-part-5/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Ranks Of, Ordered Ranks Of and Tied Ranks Of (Statistics in Erlang part 4)</title>
		<link>http://fullof.bs/ranks-of-ordered-ranks-of-and-tied-ranks-of-statistics-in-erlang-part-4/</link>
		<comments>http://fullof.bs/ranks-of-ordered-ranks-of-and-tied-ranks-of-statistics-in-erlang-part-4/#comments</comments>
		<pubDate>Fri, 15 Aug 2008 04:09:49 +0000</pubDate>
		<dc:creator>John Haugeland</dc:creator>
				<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Tools and Libraries]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[ordered ranks of]]></category>
		<category><![CDATA[ranks of]]></category>
		<category><![CDATA[scutil]]></category>
		<category><![CDATA[stats]]></category>
		<category><![CDATA[tied ranks of]]></category>

		<guid isPermaLink="false">http://fullof.bs/?p=323</guid>
		<description><![CDATA[[digg-reddit-me]The upcoming Statistics in Erlang post, on correlations, requires some groundwork tools.  At this time, I don&#8217;t actually know the proper statistical names for these functions, and in fact such names probably don&#8217;t exist, so I just made some up; these functions are treated by the texts describing correlations as an inline part of the [...]]]></description>
			<content:encoded><![CDATA[<p>[digg-reddit-me]The upcoming Statistics in Erlang post, on correlations, requires some groundwork tools.  At this time, I don&#8217;t actually know the proper statistical names for these functions, and in fact such names probably don&#8217;t exist, so I just made some up; these functions are treated by the texts describing correlations as an inline part of the process, but the process is much easier to understand if they&#8217;re seperate, so I seperated them.</p>
<p>Witness ranks_of(), ordered_ranks_of() and tied_ranks_of().  On their own, their importance isn&#8217;t tremendously apparent, but when we get to correlations, it&#8217;ll suddenly become obvious.</p>
<ul>
<li>ranks_of()
<ul>
<li>Ranks of takes a list and reverse sorts it, then returns a series of 2-tuples whose first member is the &#8220;rank&#8221; or 1-offset ordinal of the item&#8217;s list position, and whose second member is the original list value.  Erlang sort is used to order the terms, so they don&#8217;t actually need to be strictly numeric, though the function is pretty meaningless otherwise.  Soon, a version will be provided that takes a predicate for custom sorting.</li>
<li>Tied (equal) values are listed in order as separate ranks.  If you want those ties to be expressed as average values, use tied_ranks_of() below.  Because there are no ties, ranks are expressed as integers.</li>
<li>Values are presented with highest rank (rank 1, highest literal value) sorted to the front.</li>
<li><span style="color: #0000ff"><strong><span style="color: #008000">1&gt; scutil:ranks_of([3,8,22,8,535]).</span><br />
[{1,535},{2,22},{3,8},{4,8},{5,3}]</strong></span></li>
</ul>
</li>
<li>tied_ranks_of()
<ul>
<li>Very similar to ranks_of(), except that ranks are presented as tied when values are equivalent.  Notice how with ranks_of(), the two 8s are ranked 3 and 4; here, they are both ranked 3.5.</li>
<li>Because ties can create half-values, ranks are presented as floats.</li>
<li><span style="color: #0000ff"><strong><span style="color: #008000">86&gt; scutil:tied_ranks_of([3,8,22,8,535]).<br />
<span style="color: #0000ff">[{1.00000,535},{2.00000,22},{3.50000,8},{3.50000,8},{5.00000,3}]</span></span></strong></span></li>
</ul>
</li>
<li>ordered_ranks_of()
<ul>
<li>Correlations frequently need to be column cross-sorted.  The easiest way to deal with this is to provide a ranking function which provides the rankings in the column&#8217;s original order.  ordered_ranks_of() is tied_ranks_of() which does not alter list ordering.</li>
<li><span style="color: #0000ff"><strong><span style="color: #008000"><span style="color: #008000">3</span><span style="color: #0000ff"><span style="color: #008000">&gt; scutil:ordered_ranks_of([3,8,22,8,535]).</span><br />
[{5.00000,3},{3.50000,8},{2.00000,22},{3.50000,8},{1.00000,535}]</span></span></strong></span></li>
</ul>
</li>
</ul>
<p>These functions are important building blocks to the correlations, coming up in parts 5 and 6.</p>
<p><span class="entry">As usual, <a title="StoneCypher's Utility Library" href="http://scutil.com/" target="_blank">this code is part of the ScUtil library</a>.  ScUtil is free and MIT license, because the GPL is evil.</span></p>
<p>This code doesn&#8217;t need any of the prior parts <a title="Part 1 of Statistics in Erlang" href="http://fullof.bs/arithmetic-mean-geometric-mean-harmonic-mean-and-weighted-arithmetic-mean-statistics-in-erlang-part" target="_blank">1</a>, <a title="Part 2 of Statistics in Erlang" href="http://fullof.bs/mean-median-mode-and-histograph-statistics-in-erlang-part-2" target="_blank">2</a>, or <a title="Part 3 of Statistics in Erlang" href="http://fullof.bs/standard-deviation-root-mean-square-and-the-central-moments-statistics-in-erlang-part-3" target="_blank">3</a> of Statistics in Erlang, but I&#8217;m linking them to make them easy to find.</p>
<p>This closes issue <a title="Issue 129 at Crunchy Development" href="http://crunchyd.com/forum/project.php?issueid=129" target="_blank">129</a>.</p>
<p>Sorry about the screwy formatting; it&#8217;s hard fitting code into a blog.  The formatting in the module is much nicer.</p>
<pre style="padding-left: 30px">ranks_of(List) when is_list(List) -&gt;
    lists:zip(lists:seq(1,length(List)),lists:reverse(lists:sort(List))).

tied_ranks_of(List) -&gt;
    tied_rank_worker(ranks_of(List), [], no_prev_value).

tied_add_prev(Work, {FoundAt, NewValue}) -&gt;
    lists:duplicate(
        length(FoundAt),
        {lists:sum(FoundAt) / length(FoundAt), NewValue}
    ) ++ Work.

tied_rank_worker([], Work, PrevValue) -&gt;
    lists:reverse(tied_add_prev(Work, PrevValue));

tied_rank_worker([Item|Remainder], Work, PrevValue) -&gt;
    case PrevValue of
        no_prev_value -&gt;
            {BaseRank,BaseVal} = Item,
            tied_rank_worker(Remainder, Work, {[BaseRank],BaseVal});
        {FoundAt,OldVal} -&gt;
            case Item of
                {Id,OldVal} -&gt;
                    tied_rank_worker(
                        Remainder,
                        Work,
                        {[Id]++FoundAt,OldVal});
                {Id,NewVal} -&gt;
                    tied_rank_worker(Remainder,
                        tied_add_prev(Work, PrevValue),
                        {[Id],NewVal})
            end
    end.

ordered_ranks_of(List) when is_list(List) -&gt;
    ordered_ranks_of(List, tied_ranks_of(List), []).

ordered_ranks_of([], [], Work) -&gt;
    lists:reverse(Work);

ordered_ranks_of([Front|Rem], Ranks, Work) -&gt;
    {value,Item} = lists:keysearch(Front,2,Ranks),
    {IRank,Front} = Item,
    ordered_ranks_of(Rem, Ranks--[Item], [{IRank,Front}]++Work).</pre>
]]></content:encoded>
			<wfw:commentRss>http://fullof.bs/ranks-of-ordered-ranks-of-and-tied-ranks-of-statistics-in-erlang-part-4/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Standard Deviation, Root Mean Square and the Central Moments (Statistics in Erlang part 3)</title>
		<link>http://fullof.bs/standard-deviation-root-mean-square-and-the-central-moments-statistics-in-erlang-part-3/</link>
		<comments>http://fullof.bs/standard-deviation-root-mean-square-and-the-central-moments-statistics-in-erlang-part-3/#comments</comments>
		<pubDate>Mon, 11 Aug 2008 02:19:04 +0000</pubDate>
		<dc:creator>John Haugeland</dc:creator>
				<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Tools and Libraries]]></category>
		<category><![CDATA[central moments]]></category>
		<category><![CDATA[kth mean]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[rms]]></category>
		<category><![CDATA[rmse]]></category>
		<category><![CDATA[root mean square]]></category>
		<category><![CDATA[root mean square error]]></category>
		<category><![CDATA[standard deviation]]></category>
		<category><![CDATA[std dev]]></category>
		<category><![CDATA[std deviation]]></category>
		<category><![CDATA[stddev]]></category>

		<guid isPermaLink="false">http://fullof.bs/?p=314</guid>
		<description><![CDATA[[digg-reddit-me]These are standard statistical tools for measuring differences, drift and error within sets, as well as Kth moments about the mean.  Central moments are also the building blocks of skewness and kurtosis, which are covered in a later post.  Root mean square is particularly useful as a measure of set error.  Standard deviation is great [...]]]></description>
			<content:encoded><![CDATA[<p>[digg-reddit-me]These are standard statistical tools for measuring differences, drift and error within sets, as well as Kth moments about the mean.  Central moments are also the building blocks of skewness and kurtosis, which are covered in a later post.  Root mean square is particularly useful as a measure of set error.  Standard deviation is great for figuring out how much spread/differentiation there is within a set.</p>
<p>This code requires the arithmetic mean stuff from <a title="Statistics in Erlang part 1" href="http://fullof.bs/arithmetic-mean-geometric-mean-harmonic-mean-and-weighted-arithmetic-mean-statistics-in-erlang-part" target="_blank">Statistics in Erlang part 1</a>.  There&#8217;s interesting, unrelated stuff in <a title="Statistics in Erlang part 2" href="http://fullof.bs/mean-median-mode-and-histograph-statistics-in-erlang-part-2" target="_blank">Part 2</a>.</p>
<p><span class="entry">As usual, <a title="StoneCypher's Utility Library" href="http://scutil.com/" target="_blank">this code is part of the ScUtil library</a>.  ScUtil is free and MIT license, because the GPL is evil.</span></p>
<p>This implements <a title="Issue 128 at Crunchy Development" href="http://crunchyd.com/forum/project.php?issueid=128" target="_blank">issue 128</a>.  This implements <a title="Issue 138 at Crunchy Development" href="http://crunchyd.com/forum/project.php?issueid=138" target="_blank">issue 138</a>.  This finishes implementing <a title="Issue 119 at Crunchy Development" href="http://crunchyd.com/forum/project.php?issueid=119" target="_blank">issue 119</a>.</p>
<pre style="padding-left: 30px">std_deviation(Values) when is_list(Values) -&gt;
    Mean = arithmetic_mean(Values),
    math:sqrt(arithmetic_mean([ (Val-Mean)*(Val-Mean) || Val &lt;- Values ])).

root_mean_square(List) when is_list(List) -&gt;
    math:sqrt(arithmetic_mean([ Val*Val || Val &lt;- List ])).

central_moments(Items) -&gt;
    Cnt  = length(Items),
    Mean = lists:sum(Items) / Cnt,
    TFun = fun(X) -&gt;
        Base = X-Mean, B2=Base*Base, B3=B2*Base, B4=B3*Base, {B2,B3,B4}
    end,
    collapse_central_moments(Cnt, [ TFun(I) || I &lt;- Items ], {0,0,0}).

collapse_central_moments(N, [], {WM2, WM3, WM4}) -&gt;
    { WM2/N, WM3/N, WM4/N };

collapse_central_moments(N, [{I2,I3,I4}|Rem], {WM2, WM3, WM4}) -&gt;
    collapse_central_moments(N, Rem, {I2+WM2, I3+WM3, I4+WM4}).</pre>
]]></content:encoded>
			<wfw:commentRss>http://fullof.bs/standard-deviation-root-mean-square-and-the-central-moments-statistics-in-erlang-part-3/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Mean, Median, Mode and Histograph (Statistics in Erlang Part 2)</title>
		<link>http://fullof.bs/mean-median-mode-and-histograph-statistics-in-erlang-part-2/</link>
		<comments>http://fullof.bs/mean-median-mode-and-histograph-statistics-in-erlang-part-2/#comments</comments>
		<pubDate>Sun, 10 Aug 2008 23:01:07 +0000</pubDate>
		<dc:creator>John Haugeland</dc:creator>
				<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Tools and Libraries]]></category>
		<category><![CDATA[histogram]]></category>
		<category><![CDATA[histograph]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[mean]]></category>
		<category><![CDATA[median]]></category>
		<category><![CDATA[mode]]></category>

		<guid isPermaLink="false">http://fullof.bs/?p=312</guid>
		<description><![CDATA[[digg-reddit-me]Mean is a complex topic, and is covered in Statistics in Erlang part 1.
Median and mode are less complex.  It&#8217;s worth noting that mode reports its results as a list, because it&#8217;s possible for there to be several modes for a list.  Also, mode is not strictly numeric &#8211; it works for mixed-type lists too [...]]]></description>
			<content:encoded><![CDATA[<p>[digg-reddit-me]Mean is a complex topic, and is covered in <a title="Statistics in Erlang part 1" href="http://fullof.bs/arithmetic-mean-geometric-mean-harmonic-mean-and-weighted-arithmetic-mean-statistics-in-erlang-part" target="_blank">Statistics in Erlang part 1</a>.</p>
<p>Median and mode are less complex.  It&#8217;s worth noting that mode reports its results as a list, because it&#8217;s possible for there to be several modes for a list.  Also, mode is not strictly numeric &#8211; it works for mixed-type lists too (you can take the mode of a list of atoms, for example.)</p>
<p>Mode is really just a reduction of the results of histograph, which is similarly open to arbitrary type list contents.  Mode also uses a toy function even_or_odd which is provided here.</p>
<p>As usual, <a title="StoneCypher's Utility Library" href="http://scutil.com/" target="_blank">this code is part of the ScUtil library</a>.  ScUtil is free and MIT license, because the GPL is evil.</p>
<p>This closes <a title="Issue 100 at Crunchy Development" href="http://crunchyd.com/forum/project.php?issueid=100" target="_blank">issue 100</a>.  This closes <a title="Issue 105 at Crunchy Development" href="http://crunchyd.com/forum/project.php?issueid=105" target="_blank">issue 105</a>.  This closes <a title="Issue 134 at Crunchy Development" href="http://crunchyd.com/forum/project.php?issueid=134" target="_blank">issue 134</a>.  This closes <a title="Issue 135 at Crunchy Development" href="http://crunchyd.com/forum/project.php?issueid=135" target="_blank">issue 135</a>.</p>
<pre style="padding-left: 30px">histograph(List) when is_list(List) -&gt;
    [Head|Tail] = lists:sort(List),
    histo_count(Tail, Head, 1, []).

histo_count([], Current, Count, Work) -&gt;
    lists:reverse([{Current,Count}]++Work);

histo_count([Current|Tail], Current, Count, Work) -&gt;
    histo_count(Tail, Current, Count+1, Work);

histo_count([New|Tail], Current, Count, Work) -&gt;
    histo_count(Tail, New, 1, [{Current,Count}]++Work).

even_or_odd(Num) when is_integer(Num) -&gt;
    if
        Num band 1 == 0 -&gt; even;
        true            -&gt; odd
    end.

median(List) when is_list(List) -&gt;
    SList = lists:sort(List),
    Length = length(SList),
    case even_or_odd(Length) of
        even -&gt; [A,B] = lists:sublist(SList, round(Length/2), 2), (A+B)/2;
        odd  -&gt; lists:nth( round((Length+1)/2), SList )
    end.

mode([]) -&gt; [];

mode(List) when is_list(List) -&gt;
    mode_front(lists:reverse(lists:keysort(2, scutil:histograph(List)))).

mode_front([{Item,Freq}|Tail]) -&gt;
    mode_front(Tail, Freq, [Item]).

mode_front([ {Item, Freq} | Tail], Freq, Results) -&gt;
    mode_front(Tail, Freq, [Item]++Results);

mode_front([{_Item,_Freq} |_Tail],_Better, Results) -&gt;
    Results;

mode_front([], _Freq, Results) -&gt; Results.</pre>
]]></content:encoded>
			<wfw:commentRss>http://fullof.bs/mean-median-mode-and-histograph-statistics-in-erlang-part-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Arithmetic Mean, Geometric Mean, Harmonic Mean and Weighted Arithmetic Mean (Statistics in Erlang Part 1)</title>
		<link>http://fullof.bs/arithmetic-mean-geometric-mean-harmonic-mean-and-weighted-arithmetic-mean-statistics-in-erlang-part/</link>
		<comments>http://fullof.bs/arithmetic-mean-geometric-mean-harmonic-mean-and-weighted-arithmetic-mean-statistics-in-erlang-part/#comments</comments>
		<pubDate>Sun, 10 Aug 2008 22:34:09 +0000</pubDate>
		<dc:creator>John Haugeland</dc:creator>
				<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Tools and Libraries]]></category>
		<category><![CDATA[arithmetic mean]]></category>
		<category><![CDATA[average]]></category>
		<category><![CDATA[geometric mean]]></category>
		<category><![CDATA[harmonic mean]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[mean]]></category>
		<category><![CDATA[scutil]]></category>
		<category><![CDATA[stats]]></category>
		<category><![CDATA[weighted arithmetic mean]]></category>
		<category><![CDATA[weighted mean]]></category>

		<guid isPermaLink="false">http://fullof.bs/?p=309</guid>
		<description><![CDATA[[digg-reddit-me]I&#8217;ll be putting up some statistical functions I&#8217;ve had to write recently.  This is the first batch.  I confess, I find the erlang implementations far more readable than the pure math definitions one finds around; I&#8217;ve been thinking about writing tutorials, but with this code here, I&#8217;m not entirely sure it&#8217;s necessary.
At any rate, the [...]]]></description>
			<content:encoded><![CDATA[<p>[digg-reddit-me]I&#8217;ll be putting up some statistical functions I&#8217;ve had to write recently.  This is the first batch.  I confess, I find the erlang implementations far more readable than the pure math definitions one finds around; I&#8217;ve been thinking about writing tutorials, but with this code here, I&#8217;m not entirely sure it&#8217;s necessary.</p>
<p>At any rate, the code follows.  As with so much of my erlang code, <a title="StoneCypher's Utility Library" href="http://scutil.com/" target="_blank">this code is part of the ScUtil library</a>.  ScUtil is free and MIT license, because the GPL is evil.</p>
<p>There&#8217;re more statistics coming, I just don&#8217;t want to make any one post too huge, and I don&#8217;t want keyword saturation.</p>
<p>This partially closes <a title="Issue 119 at Crunchy Development" href="http://crunchyd.com/forum/project.php?issueid=119" target="_blank">issue 119</a>.  This closes <a title="Crunchy Development issue 133" href="http://crunchyd.com/forum/project.php?issueid=133" target="_blank">issue 133</a>.  This closes <a title="Crunchy Development issue 136" href="http://crunchyd.com/forum/project.php?issueid=136" target="_blank">issue 136</a>.  This closes <a title="Crunchy Development issue 137" href="http://crunchyd.com/forum/project.php?issueid=137" target="_blank">issue 137</a>.</p>
<pre style="padding-left: 30px">list_product(List) when is_list(List) -&gt;
    list_product(List, 1).

list_product([], Counter) -&gt;
    Counter;

list_product([Head|Tail], Counter) -&gt;
    list_product(Tail, Counter*Head).

arithmetic_mean(List) when is_list(List) -&gt;
    lists:sum(List) / length(List).

geometric_mean(List) when is_list(List) -&gt;
    math:pow(scutil:list_product(List), 1/length(List)).

harmonic_mean(List) when is_list(List) -&gt;
    length(List) / lists:sum([ 1/X || X&lt;-List ]).

weighted_arithmetic_mean(List) when is_list(List) -&gt;
    weighted_arithmetic_mean(List, 0, 0).

weighted_arithmetic_mean([], Num, Denom) -&gt;
    Num/Denom;

weighted_arithmetic_mean([{W,V}|Tail], Num, Denom) -&gt;
    weighted_arithmetic_mean(Tail, Num+(W*V), Denom+W).</pre>
<p style="padding-left: 30px">
]]></content:encoded>
			<wfw:commentRss>http://fullof.bs/arithmetic-mean-geometric-mean-harmonic-mean-and-weighted-arithmetic-mean-statistics-in-erlang-part/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Implementing Geometric Mean in MySQL</title>
		<link>http://fullof.bs/implementing-geometric-mean-in-mysql/</link>
		<comments>http://fullof.bs/implementing-geometric-mean-in-mysql/#comments</comments>
		<pubDate>Fri, 04 Jul 2008 19:06:09 +0000</pubDate>
		<dc:creator>John Haugeland</dc:creator>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://fullof.bs/?p=206</guid>
		<description><![CDATA[This turns out to be pretty easy.  You might also want to read Implementing Harmonic Mean in MySQL.
Why bother blogging something this simple?  Because I couldn&#8217;t find one pre-made, and that means it&#8217;s time to bring my mighty google rank of like negative two to bear to fix the problem.
CREATE TABLE example(val integer);
INSERT INTO example [...]]]></description>
			<content:encoded><![CDATA[<p>This turns out to be pretty easy.  You might also want to read <a title="Harmonic Mean" href="http://fullof.bs/implementing-harmonic-mean-in-mysql">Implementing Harmonic Mean in MySQL</a>.</p>
<p>Why bother blogging something this simple?  Because I couldn&#8217;t find one pre-made, and that means it&#8217;s time to bring my mighty google rank of like negative two to bear to fix the problem.</p>
<pre style="padding-left: 30px">CREATE TABLE example(val integer);
INSERT INTO example VALUES(1),(2),(4),(8),(16);
SELECT exp(avg(ln(val))) as gmean from example;</pre>
<p>That&#8217;s it.  You should see output like this:</p>
<pre style="padding-left: 30px"><strong>mysql&gt; CREATE TABLE example(val integer);</strong>
Query OK, 0 rows affected (0.09 sec)

<strong>mysql&gt; INSERT INTO example VALUES(1),(2),(4),(8),(16);</strong>
Query OK, 5 rows affected (0.01 sec)
Records: 5  Duplicates: 0  Warnings: 0

<strong>mysql&gt; SELECT exp(avg(ln(val))) as gmean from example;</strong>
+- - - -+
| gmean |
+- - - -+
|     4 |
+- - - -+
1 row in set (0.00 sec)
</pre>
<p>Still haven&#8217;t figured out how to do central moments, skewness or kurtosis.</p>
]]></content:encoded>
			<wfw:commentRss>http://fullof.bs/implementing-geometric-mean-in-mysql/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Implementing Harmonic Mean in MySQL</title>
		<link>http://fullof.bs/implementing-harmonic-mean-in-mysql/</link>
		<comments>http://fullof.bs/implementing-harmonic-mean-in-mysql/#comments</comments>
		<pubDate>Fri, 04 Jul 2008 19:02:14 +0000</pubDate>
		<dc:creator>John Haugeland</dc:creator>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://fullof.bs/?p=207</guid>
		<description><![CDATA[This turns out to be pretty easy.  You might also want to read Implementing Geometric Mean in MySQL.
Why bother blogging something this simple?  Because I couldn&#8217;t find one pre-made, and that means it&#8217;s time to bring my mighty google rank of like negative two to bear to fix the problem.
CREATE TABLE example(val integer);
INSERT INTO example [...]]]></description>
			<content:encoded><![CDATA[<p>This turns out to be pretty easy.  You might also want to read <a title="Geometric Mean" href="http://fullof.bs/implementing-geometric-mean-in-mysql">Implementing Geometric Mean in MySQL</a>.</p>
<p>Why bother blogging something this simple?  Because I couldn&#8217;t find one pre-made, and that means it&#8217;s time to bring my mighty google rank of like negative two to bear to fix the problem.</p>
<pre style="padding-left: 30px">CREATE TABLE example(val integer);
INSERT INTO example VALUES(1),(2),(4),(8),(16);
SELECT count(val) / sum(1/val) as hmean from example;</pre>
<p>That&#8217;s it.  You should see output like this:</p>
<pre style="padding-left: 30px"><strong>mysql&gt; CREATE TABLE example(val integer);</strong>
Query OK, 0 rows affected (0.09 sec)

<strong>mysql&gt; INSERT INTO example VALUES(1),(2),(4),(8),(16);</strong>
Query OK, 5 rows affected (0.01 sec)
Records: 5  Duplicates: 0  Warnings: 0

<strong>mysql&gt; SELECT count(val)/sum(1/val) as hmean from example;
</strong>+- - - - +
| hmean  |
+- - - - +
| 2.5806 |
+- - - - +
1 row in set (0.00 sec)</pre>
<p>Still haven’t figured out how to do central moments, skewness or kurtosis.</p>
]]></content:encoded>
			<wfw:commentRss>http://fullof.bs/implementing-harmonic-mean-in-mysql/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
