A Better Erlang TCP listening pattern: addressing the fast packet loss problem

Erlang, Programming, Tools and Libraries, Tutorials 11 Comments

[digg-reddit-me]I’ve had mixed reactions to this when I’ve discussed it with people on IRC.  This may be well known to oldbear Erlang programmers.  I suppose it’s also possible that I’m wrong, though I’ve talked to several people I respect, and more than one of them have suggested that they were already aware of this problem.  If I’m wrong, please let me know; I’m open to the possibility that there’s a better answer that I just don’t know about.  I’ve never seen it discussed on the web, at least. Update: Serge Aleynikov points out that this TrapExit tutorial documents this approach.

I think this is probably real.

I believe there is a significant defect in the idiomatic listener pattern as discussed by most Erlang websites and as provided by most Erlang example code, and which is found in many Erlang applications.  This defect is relatively easily repaired once noticed without significant impact on surrounding code, and I have added functionality to my ScUtil Library to handle this automatically under the name standard_listener.

The fundamental problem is essentially a form of race condition.  The idiomatic listener is, roughly:

do_listen(Port, Opts, Handler) ->

    case gen_tcp:listen(Port, Opts) of

        { ok, ListeningSocket } ->
            listen_loop(ListeningSocket, Handler);

        { error, E } ->
            { error, E }

    end.

listen_loop(ListeningSocket, Handler) ->

    case gen_tcp:accept(ListeningSocket) of

        { ok, ConnectedSocket } ->
            spawn(node(), Handler, [ConnectedSocket]),
            listen_loop(ListeningSocket);

        { error, E } ->
            { error, E }

    end.

Now consider the case of a server under very heavy load.  Further, consider that the listening socket is opened either {active,true} or {active,once}, which is true in the vast bulk of Erlang network applications, meaning that packets are delivered automatically to the socket owner.  The general pattern is that the listening socket accepts a connection, spawns a handling process, passes the connected socket to that handling process, and the handling process takes ownership of the socket.

The problem is that it takes time for that all to happen, and Erlang doesn’t specify or allow you to control its timeslicing behavior (as well it should not).  As active sockets are managed by a standalone process, this means that if the connecting client is fast and the network is fast, the first packet (even the first several under extreme circumstances) could be delivered before the socket has been taken over by the handling PID, meaning that its contents would be dispatched to the wrong process, with no indication of where they were meant to go.  This invalidates connections and fills a non-discarding mailbox, which is a potentially serious memory leak (especially given that erlang’s response to out of memory conditions is to abort an entire VM.)

Obviously, this is intolerable.  There are better answers, though, than to switch to {active,false}.  One suggestion I heard was to pre-spawn handlers in order to reduce the gap time, but that doesn’t solve the problem, it just makes it less likely.

The approach that I took is to lie.  standard_listener takes the following steps to resolve the problem:

  1. Add the default {active,true} to the inet options list, if it isn’t already present.
  2. Strip out the {active,Foo} from the inet options list, and store it as ActiveStatus.
  3. Add {active,false} to the inet options list, and use that options list to open the listener.
  4. When spawning handler processes, pass a shunt as the starting function, taking the real handling function and the real ActiveStatus as arguments
  5. The shunt sets the real ActiveStatus from inside the handler process, at which point the socket begins delivering messages

This neatly closes the problem.  A free, MIT license implementation can be found in ScUtil beginning in version 96.  A simplified, corrected example follows for immediate reference; the thing in ScUtil is more feature complete.

do_listen(Port, Opts) ->

    ActiveStatus = case proplists:get_value(active, SocketOptions) of
        undefined -> true;
        Other     -> Other
    end,

    FixedOpts = proplists:delete(active, SocketOptions)
             ++ [{active, false}],

    case gen_tcp:listen(Port, FixedOpts) of

        { ok, ListeningSocket } ->
            listen_loop(ActiveStatus, ListeningSocket);

        { error, E } ->
            { error, E }

    end.

listen_loop(ActiveStatus, ListeningSocket, Handler) ->

    case gen_tcp:accept(ListeningSocket) of

        { ok, ConnectedSocket } ->
            spawn(?MODULE, shunt, [ActiveStatus,ConnectedSocket,Handler]),
            listen_loop(ActiveStatus, ListeningSocket, Handler);

        { error, E } ->
            { error, E }

    end.

shunt(ActiveStatus, ConnectedSocket, Handler) ->

    controlling_process(ConnectedSocket, self()),
    inet:setopts(ConnectedSocket, [{active, ActiveStatus}]),
    Handler(ConnectedSocket).

Moments, Skewness and Kurtosis (Statistics in Erlang part 8)

Erlang, Math, Programming, Statistics, Tools and Libraries, Tutorials No Comments

[digg-reddit-me]So, it was pointed out to me that I had the central moments, but not the moments – ie, the ones not normalized against the input’s average.  Also, it was pointed out that most people don’t know that kurtosis and skewness are related to the central moments.  Furthermore, it turns out (and I didn’t know this) that there are in fact meaningful uses of floating-point exponents in moments.

So, I implemented moments, I replaced my central moments implementation, and I gave name wrappers for skewness and kurtosis to make them easier to identify.

This closes issues 169, 170, 171, 172 and 173.  As usual, this code is part of the ScUtil library, which is free and MIT licensed, because the GPL is evil.

moment(List, N) when is_list(List), is_number(N) ->
    scutil:arithmetic_mean( [ pow(Item, N) || Item <- List ] ).

moments(List) ->
    moments(List, [2,3,4]).

moments(List, Moments) when is_list(Moments) ->
    [ moment(List, M) || M <- Moments ].

central_moment(List, N) when is_list(List), is_number(N) ->
    ListAMean = scutil:arithmetic_mean(List),
    scutil:arithmetic_mean( [ pow(Item-ListAMean, N) || Item <- List ] ).

central_moments(List) ->
    central_moments(List, [2,3,4]).

central_moments(List, Moments) when is_list(Moments) ->
    [ central_moment(List, M) || M <- Moments ].

skewness(List) -> central_moment(List, 3).
kurtosis(List) -> central_moment(List, 4).

As long as I’m posting random crap, how about eJabberD Install Docs

Erlang, Miscellaneous, Tools and Libraries, Tutorials 1 Comment

I had to write these for a colleague some months ago, and promptly forgot in classic fashion. Here’s something to mock my inability to write coherent install docs for posterity: Setting Up eJabberD From Scratch. A how to, of sorts, I guess. This was written for a Centos server, but is probably accurate for most Unices (don’t really know for sure.) Read the rest…

Should I be writing about Erlang?

Erlang, Programming, Tutorials 7 Comments

I’m becoming ever more convinced that the answer is yes. I’ve been playing, a bit, a game called Project Euler, a game for programmers wherein the object is to find solutions to deceptively simple problems. It’s surprisingly entertaining, and your score is a result of the function of programmers which have not succeeded in a task.

There are people who take long roundabout approaches to get to results like these, when instead they could be doing things like

p1() -> lists:sum(

    [ X || X <- lists:seq(1,10),

      ((X rem 3) == 0) orelse ((X rem 5) == 0) ]

).

As a result, I’m starting to think that I need to start explaining things. Anyone agree or disagree?

Stone PHP SafeCrypt: Convenient, Secure and Typesafe Encryption (Tutorial, Library and Test Code)

PHP, Programming, Tools and Libraries, Tutorials 44 Comments

When wandering around Das Intarweb one sees a lot of sad, sad code. In fact, people who should know better get busted on weak crypto all the time. Indeed, even the PHP manual examples have unacceptable security flaws. When one is writing encryption code for one’s own site, that turns into a problem. Here’s a library to wrap and a little primer on using the standard encryption facilities in PHP safely and correctly.

Read the rest…

Checklist for Embedded IE

AJAX, C/C++, DOM, ECMA / Javascript, Internet Explorer, Programming, Tools and Libraries, Tutorials, Web and Web Standards 2 Comments

MSHTML is an awesome user interface tool, but it has a whole lot of standard behaviors, many of which aren’t what one wants for an application (since it’s designed for the web.) This is a list of stuff you need to do to embed IE COM and have it behave like a normal application. There’s more than a person might expect.

Read the rest…

Oh, Neat: A Hack To Fix The IE Click Problem

C/C++, Internet Explorer, Programming, Tutorials 4 Comments

I always wondered how applications suppressed that god-awful clicking sound in embedded IE. Usually you catch them screwing with user preferences, but there’s a utility I have which never showed any apparent method of getting rid of the goddamned noise. I always assumed they just styled text to look like a link.

Well, I still don’t know what he does, but I found a way.

Read the rest…