XMLHttpRequest – So Close and yet So Far
April 25, 2006 5:37 pm AJAX, DOM, ECMA / Javascript, Programming, Rants, Web and Web StandardsTo be plain, I think the XMLHttpRequest object is quite wonderful. It’s a deeper realization than I’d had when I was talking about RFC2557+ some years ago; rather than binding people to URLs, it frees them to use sockets.
Or does it?
There is one simple truism that has shown itself over and over again in programming. With the new toolset available, I don’t think there’s anyone who’s going to argue that web programming isn’t programming anymore. It’s becoming the case that what we learned in software engineering is starting to show true in web development. In this respect, XMLHttpRequest shows several promises which aren’t immediately obvious, and several pitfalls which apparently haven’t been noticed at all.
The truism to which I refer is that nothing is ever enough for a programmer. That’s not as broad as it sounds; it leads to a few very specific and very important realizations which need to be accounted for.
- Things need to be modular, because we’ve scaled too far to even consider looking at systems as a whole for small changes. This is one of the web’s strongest points, in large part owing to the fundamental nature of SGML as a meta-markup and the brilliant guiding hand who was John Postel. In this respect, XMLHttpRequest is well on its way to success; the standard is beginning to take into account languages other than ECMA derived scripts, and sockets are already extremely modular.
- Things need to be resilient. There are adequate mechanisms to determine state and error state, though a callback for connection failure would have been nice (though, admittedly, it’s easy enough to implement, which I’ll show in a later blog post.)
- Things need to attach to their host environment in a natural fashion. This has traditionally been a problem for languages like FormulaONE and Prolog, though it seems Mozart-Oz is working them out. This is arguably XMLHttpRequest’s greatest strength, the foresight of the responseXML property. Unfortunately, it also helps mask one of the biggest failings (the only serious one that I see, actually.)
- There will come a point at which the programmer needs to do something that isn’t covered. Languages, libraries, interfaces and settings which allow the programmer to get to the meat and start replacing things are extremely desirable. Various examples of this abound; few have had as much success as the Standard Template Library or URLs. In this fashion, XMLHttpRequest appears to have its bases covered, with the responseText mechanism. You can recieve things that aren’t XML and work with them how you see fit, which should open up entirely new vistas of usability and blah blah blah if you’re just willing to do some string parsing. The idea is that since you can do raw parsing, and since these are sockets, that that allows you to use the network openly. Herein lies XMLHttpRequest’s secret flaw – requestText is crippled by the connection mechanism. This isn’t actually good for much more than porting pre-XML legacy web services.
See, I started writing this AJAX tutorial, and I was getting all into it. I started by writing a lightweight DOM calculator to get people used to working on the document model from a scripting language. It went well, and Uncle Fatty was happy. The idea was to move from that to writing a chess game. Since I used to work for one of the big chess service providers, I decided that what I actually ought to write was a client which touched their servers.
Ha ha. Joke’s on me. See, I fell for it too: if it does responseText, then I can just parse whatever I get manually, and everything’ll be flowers and syrup, right? So, I start by porting my old style13-2 and style12 parsers, so that I can touch FreeChess, ChessClub and ChessLive. Porting them into ECMA was moderately tehsuck, but to be honest it coulda been worse, so I’m not complaining much. Built those on edit boxes, fed it a command, it reacted, and eventually I found the right neurons to make the frog jump, and everything was hoppy. Spent some time nailing stupid effects from Rico and Scriptaculous onto it, and generally had myself a fun waste of a couple of hours. Even hammered out a froody little chat system.
Then, it’s time to get down to the horror that is nailing XMLHttpRequest onto the service, right? Because I can parse whatever I receive, so it’ll be candy and bee-farts, right? Ha, ha: you sir got screwed.
Y’see, folks, it’s called XMLHttpRequest for a reason. It’s not really sockets, no matter how much .responseText wants to trick you into thinking it is. It doesn’t matter if you can parse whatever random damn thing you receive, because of one ugly detail: there is no way to make a connection without sending HTTP headers first. Now, I’m sure several people laughing at me for being stupid right now. On the one hand, I did earn it, and so I deserve it; on the other hand, shut up. At first I thought I was going to fake it by turning off all the headers, but it turns out that the definition of setRequestHeader essentially prohibits it from being possible at all. Thing is, it’s actually very easily reparable. Now, let’s pretend for a moment that security is the province of the browser author and not the specification, and that raw sockets haven’t existed safely in Flash for quite a while already. Raw sockets themselves aren’t fundamentally dangerous, and because of the way buffering is bound up in the object, it’s actually a pretty safe playground.
The more important bit, though, is how ridiculously powerful such a tool would be. That would, in many ways, be what X always wanted to be: a strong, powerful, scripted, media-rich and gracefully degrading platform for semi-thin network applications. This has the advantage of having all the “server” behavior bound up in the web browsers, already far better standardized and far more powerful than X ever was (I’m sure I’ll take flak for saying that.) My chess client is a good example of what could be powerful here: if I could just say “no, don’t send the HTTP headers,” suddenly I would have a portable rich chess client in trivially little code.
As of now, I have to write a stupid little server whose entire purpose is to intercept AJAX, strip the http headers away, then act as a passthrough to the real server. It’s a waste of machine time, a waste of bandwidth and of coding effort. And, the thing is, I think if we don’t change the XMLHttpRequest object to accomodate, we’re going to see a lot of just that kind of behavior. Sure, the original intent of the object was to allow web browsers to embed browsing behavior in their scripting, essentially allowing scripts to act like clients. That said, sometimes a mechanism is far more powerful than its original intent (Tim Berners-Lee’s invention of a way to keep data around the particle collider up to date is a beautiful example, because I’m using its grandchild to put this argument together right now.)
Sometimes small modifications to allow the programmer better control are worth their weight in gold. Yes, I realize it was an HTTP request mechanism. I think it shouldn’t be. The allowance is made for raw receiving in responseText, because the utility is obvious. The allowance should be made for raw sending, too. I believe it’s significantly important. This would be a major step towards portable rich applications with network capabilities. The actual implementation shift is tiny, well-understood and safe. This really, really needs to be done.
Of course, once you do that, another fairly serious issue rears its head: that this is a one-directional avenue for communication. We want to build deeply connected apps, here! We always have, back through SOAP and DCORBA and X11 and UUX and so on. There are several docs out there dedicated to the issue of emulating server push, many of which use browser extensions or to get the job done. A much more common answer, since it’s portable, is polling, which is immensely work- and bandwidth-wasteful, and puts a direct server tax on reducing lag (the faster they poll, the harder you have to work to cover something that really should just be data being sent out.) There are already tons of applications which work this way, and it’s just going to get worse. However, there is an unexplained parameter in the connection list called “async” which may handle exactly this; if so, the second set of needs is obviated.
On those grounds, I recommend the following alterations to the XMLHttpRequest standard:
Overview:
To support the direction that AJAX is already taking itself and to reduce the implementation cost of certain commonly desired features, we will define mechanisms to implement raw connections aside from the default HTTP connections, and to recieve server pushed data without extraneous polling mechanisms. This requires minor but significant alterations to several parts of the proposed standard, and though the implementation cost of these changes is small, it requires a significant re-evaluation of the potential of the mechanism.
In the definition of open(), the modes GET, POST and PUT are required. Two other methods are questionned, suggesting that there is still flexibility in the mode list. That’s good: this is the natural place to add raw connection. I propose a fourth mode: RAW.
- It seems appropriate that RAW be placed in the MUST support group; things can be supported and turned off for security reasons, and it’s important that we be able to write software which can check for these things and then move on.
- RAW suppresses all HTTP negotiation and all HTTP headers. Headers may not be set with setRequestHeader() on a RAW method connection.
- The current send-recieve readyState reporting mechanism would not be sufficient for RAW. Likely the most appropriate thing to do would be to add another enumerant to the state to represent bidirectional transfer, and to change the state transition rules for RAW sockets. This seems likely to result in the most amenable environment to older code. The state changes would need to be altered to allow send() on a RAW socket during any state other than 0. Because of the editorial note suggesting that readyState transition be amended for HEAD method requests, it seems the editorial body is already amenable to different send() semantics for different readyStates.
- There is the question of how to handle sequential input and sequential output. Receipt on a RAW socket should probably just append onto the existing buffer, as well as attempting to send while send is already underway (or, the current exception behavior could be maintained for the latter case; that’s a matter of opinion, but I believe queueing is appropriate.)
- There is also the argument that each distinct read be cached in a seperate entry in an array, but without the HTTP transport, there would need to be a way to distinguish packet breakup from seperate transfer; it seems inappropriate, especially that this is meant to be a raw socket, and as such I believe the former answer is superior.
Actually, the second point as a casualty covers server push, too: with the bidi mode in place, you can safely handle any state transition, so we can just happily fire lots of state changes to cover each server push. This makes async (probably) unnessecary. Because many deployed AJAX codebases don’t account for such situations, it may be appropriate to keep the current behavior stable, and to add another HTTP request type that has RAW-style state transitions (this would also neatly handle the HEAD problem) called, say, PUSH, which would allow the immediate deployment of network applications which aren’t tied to the client-server pipe mindset.
This approach has several advantages:
- This approach is similar to concepts already under discussion and uses similar mechanisms, but covers several under consideration issues in less work, and in a way which opens up an extremely powerful new tool.
- This approach would have essentially zero impact on deployed code, and would cause no disruptions to existing deployments.
- This approach can be handled quite easily in a way to degrade gracefully in situations where a polling style mechanism or something similar provides an adequate substitute.
- This approach drastically reduces idle bandwidth for certain network connection behaviors.
- This approach is extremely easy to implement at a browser level, consisting essentially of work that’s already done with bits ripped out. It is likely to be extremely quickly deployed if adopted by a standards body. Hacks to approach both of the behaviors this allows are already becoming widespread. We need to open this floodgate early so that it can be handled in a standard fashion; browsers are already starting to expose their guts, and there’s a possible browser incongruity in the near future if we don’t get this handled right now.
- This provides an extremely pleasant new avenue for thin client development, and will catch like wildfire in the existing AJAX crowd.
Please give these changes serious consideration. If you support them, get some trackbacks going and get the word out. This is the sort of thing that not too many people will see the importance of, so it’s important that the word get around so that the right people can pick up the sound. If you don’t agree, tell me why.
XMLHttpRequest is in the hands of two people, and small groups of people are often more likely to listen than large groups. Their minds can be changed. Want real HTML network clients? Help me change them.

April 25th, 2006 at 6:25 pm
Wow, I didn’t even realize XMLHttpRequest didn’t support these very important things. So you’re saying they don’t have
1. persistant open connections with server-push, or
2. non-http connections
That’s crazy.
Here’s a quick summary of how flash compares: (I’m a hardcore flash programmer by day)
Flash 6 was released on March 15, 2002, (4 years ago, an eternity in internet time) which contained XMLSocket — raw persistant sockets, which let you pass any data you want, with one caveat: all messages must be terminated with a ”. The socket stays open as long as you want, and you get a callback every time the client recieves a null-terminated message. Even though it’s called XMLSocket, you can opt to skip the xml parsing and fiddle with the text directly. It’s really very simple and easy to use IMO.
Flash 9 (currently in beta, expected to release this summer) contains raw binary sockets, I haven’t looked into them, but already I’ve seen a flash-only -no-server-hacks pop3 client and a vnc client, so I believe all limitations are removed.
April 25th, 2006 at 6:39 pm
[...] Not quite DS related but considering my first love is web applications development, I figured I should post this. I agree 100% that XMLHttpRequest should be expanded on for when XHR becomes a W3C standard. Give us a way where we can truly make the web as dynamic as it should be. [...]
April 25th, 2006 at 6:44 pm
[...] So I’ve been ignoring the whole ajax/js/dhtml thing, only doing flash programming. It turns out that the implementation of XmlHttpRequest has a few major shortfalls. Short summary of linked article: [...]
April 25th, 2006 at 9:30 pm
I do hope you’re going to send this to the mailing list.
April 25th, 2006 at 9:38 pm
I would, if you had left an address in that link
April 26th, 2006 at 3:06 am
Looks like wordpress filters out mailto: urls. The address is copied from the “status of this document” section of the working draft you linked in the first sentence.
April 26th, 2006 at 5:19 pm
Oh, you mean the two maintainers? Yes, the reason I wrote this blog post was to get comments, criticism and groundswell support, specifically to sway them. That’s to whom I was referring when talking about small groups.
May 3rd, 2006 at 7:00 pm
No polling is needed; the async flag does what you need. It will give you a callback every time it gets another chunk of data in. If you lose the connection, you’ll get a callback so you know to establish another one.
As for raw sockets, they would be nice for some applications but are overkill for everything you talked about doing. All XMLHttpRequest really needs is the ability to incrementally POST data. It would do it over the Chunked transfer-encoding, so it’s all standard HTTP and you don’t need to do your own hacked-together protocol to handle the same issues. I was very surprised when I discovered that this isn’t possible; it seems like a really silly omission.
In your blog entry, you seem to be implying that the overhead comes from constantly re-sending headers. I disagree; I would be very surprised if that were the case. The overhead (and it is substantial!) comes from re-establishing the connection. You have to redo the whole TCP slow-start stuff, and your server has to pay attention and reassociate you with your session.
Regardless, allowed chunked POSTs would solve both problems anyway.
(The only annoying thing is the STUPID decision in HTTP to transmit chunk sizes in variable-length hex notation. What were they thinking? A huge part of the point of sending length prefixes is so you can read exact byte counts and not have to mess around with nonblocking i/o! And if it’s variable length, then why use hex instead of decimal? It’s like they had two committees that came to opposite conclusions, so they picked half of each’s decision!)
May 3rd, 2006 at 9:30 pm
Yes, XHR should be extended to do that, but if it is, it should stop being called XMLHttpRequest. There are two big reasons, in fact, that it’s a bad name:
1.) XML is an abbreviation. So is HTTP. So why is XML all caps and Http is treated as a word? Either call it XmlHttpRequest or XMLHTTPRequest.
2.) As soon as it’s no longer about XML, it shouldn’t be called that. It should be called something like “TCPPipeline” or “Socket” or something, with the parts about XML and HTTP as possible arguments, even if they are default arguments.
Why? Because neatness matters, naming is part of neatness, and as hard as it will be to change this now (even if you keep around XMLHttpRequest as a wrapper), it will be near-impossible after being an official standard for years. And it’s just going to seem counter-intuitive to implement something using XMLHttpRequest that has nothing to do with XML or HTTP, and isn’t any kind of request.
May 4th, 2006 at 5:29 pm
Steve: it’s an interesting point you raise, and yes, the async flag can handle part of what I want. The polling is the lesser of the two evils, though; it’s the ability to make requests without headers that matters to me. We disagree, though, on the point about the overkill bit.
I’m not arguing that these things can’t be done through polling. What I’m arguing is that this mechanism can be applied to existing services, rather than just new ones, if we provide a mechanism to suppress the HTTP headers. That the mechanism I describe can also be used to fix the polling issue is sort of an ancillary woot.
Please note that the async flag is only a proposition at this point, and isn’t universally deployed. There is a superior mechanism than the async flag hack, and I believe I’ve described it. Yes, the async hack is neat and useful, but it’s limited in an unnessecary fashion, and it’s not the brass ring besides.
The important bit is suppressing those headers.
As far as what the overhead is, I discussed both – the time lag you mention is the degredation to the user experience I discussed. That said, don’t forget that that bandwidth cost, summed, is also potentially pretty high. I know, the bandwidth cost of polling seems trivial, but it isn’t. I’ll assume two second polling, though I’d honestly want more to support natural-seeming chat (my private ajax chat system runs at quarter-second polls, and it still feels a little laggy during heavy chat.)
Check the math:
3 bytes for the VJ header
1 byte for the minimum “is there new data” message
= 4 byte minimum message
x —-
30 polls per minute
1000 users (that’s small)
= 30,000 polls per minute
x —-
60 minutes/hour
24 hours/day
31 days/May
= 44,640 minutes/may
44,640 * 30,000 * 4 = 5,356,800,000
5.3 _gig_ of traffic for a four byte poll every other second for a thousand users.
Now, when you consider that some polls can’t be one byte, things go up. If you want to poll more than every 2 seconds, things go up. If you want more users at once than a measely thousand, things go up. Let’s consider a more realistic example. Say we want to use AJAX to implement the front end to a large-scale persistant strategy game, the kind that you saw from old BBS vendors migrating to the Intarweb in its early days. I’ll use Earth: 2025, by Mehul Patel (Barren Realms Elite was the schizz, yo) from 1997 as my example.
3 bytes for the VJ header
6 byte minimum message (1 byte for “Any updates,” 2 bytes for each the center X and center Y of the current viewscreen, and one byte pointing at some enumeration listing likely portal sizes, since it’s cheaper than the four you’d need to describe portal size)
= 9 byte minimum message
x —-
60 polls per minute (1 sec _minimum_ for that game style)
75,000 users
= 4,500,000 polls / minute
x —-
60 minutes/hour
24 hours/day
31 days/May
= 44,640 minutes/may
9 * 4,500,000 * 44,640 = 1.8 TERABYTES.
Those are realistic numbers. Earth: 2025 got quite a bit bigger than that for a while. It was ad-driven, not subscription driven, but still, when you’ve got 75K concurrent users, you can afford 1.8T traffic. That’s $200/mo worth of traffic from ServerPronto, not a huge stupid problem.
But, that’s $200/mo for a moderate sized web game’s ping packets *alone,* before you consider replies, before you consider the actual game traffic, before you consider the mammoth amount of congestion that seventy five thousand unnessecary packets per second create, the frequent delays in routing that will result from other data which should be able to cope, et cetera. This means extra servers, extra delays, lost traffic, packet collisions and packet repeats out the wazoo. It’s a tremendous, unnessecary waste.
And, before you tell yourself those are unrealistic numbers, go log onto one of the larger IRC networks like efnet or undernet, join #help, and ask about the traffic one of their nodes sees. Efnet hosts about 150k people at once, or roughly double my prior estimate. Once you find out what their real traffic is like, imagine imposing 3.6 gig of traffic and 150kpackets/second on top of that.
The polling traffic is significant. Still, the headers are what really gets me.
May 4th, 2006 at 5:32 pm
David: it’s already stopped being about XML, hence the .responseText property. Whereas I agree that in retrospect the name has become silly, you might as well point out that HTTP is used for lots of other things than transporting hypertext now. Changing the name would at this point cause more trouble than keeping it; that’s why it’s important that changes be made early.
All the more reason to get the fix I describe for headers and polling in now, rather than later.
January 4th, 2007 at 3:45 pm
FatBlog » XMLHttpRequest – So Close and yet So Far…
…
August 5th, 2008 at 10:13 pm
Flash, as I’m sure you know, can do the sockets you mention. And interestingly enough, with some clever javascript, you can wrap an invisible SWF with an interface that behaves exactly like the native XHR object, meaning it can be used as a drop-in replacement.
flXHR (http://flxhr.flensed.com/) has done essentially this exact thing, although it’s not designed to leverage the sockets but just regular cross-domain communication (using Adobe’s security model — server opt-in policies). But the concept could (and probably will eventually!) be extended to sockets as well.
And, to the point of naming, yes, it’s a misnomer to some extent, but recognizability and compatibility is also a very important factor. If you start renaming things, you’ll have a whole slew of authors/developers who will get shaken up, and a whole set of code that’ll have to change.
Besides, the name can still suggest the *best* (subjective opinion, context-sensitive I know!) type of data transmission packaging (XML), even though the object itself can be more flexible.
August 6th, 2008 at 1:17 pm
Mr Simpson:
Yes, Flash can do raw sockets, but Flash has its own set of problems, such as the crossdomain opt-in you mention; that means that Flash can’t be used as a thin client for services which do not expect it.
This is essentially the definition of epic fail, IMO. This is why there are, for example, no flash FTP or mail clients, whereas there should be. Flex is a canonical example of an application platform that’s missing the boat in the name of naive security policies. There is no reason in the world that the Flash player couldn’t just request permission from the user, like it does for disk space.
August 11th, 2008 at 6:23 am
@John- I don’t believe the server opt-in is a “problem” as you assert — in fact, quite the opposite, I believe it’s a strong selling point for flash+javascript solutions for cross-domain communication. It’s currently the only really viable solution for controlled/authorized/’secure’ cross-domain communication.
And plenty of sites, like FLICKR, YouTube, etc, have made ‘*’ cross-domain policies on their API so any site can gain access, which IMHO shows off the power of the Flash security model — for sites who need to restrict, the power is there, and for sites which want to open up to the whole world, the power is also there.
Is Flash’s model perfect? Heck no. Does it need some work? Definitely. But I believe it’s the best thing we have going right now. And I hope the next-gen XHR/XDR’s will take some lessons from what Adobe has done (well and poorly!).
And btw, there are in fact Flash-based mail clients already, and I don’t see why there couldn’t be (admittedly limited-use) FTP clients as well. But yeah, their ability to be targeted at *any* location would be restricted by the server opting in.
But again, this is the whole point of the model — to be ‘secure’ (at least from arbitrary client-side proxying) if a server wants to. The fact is, client-side proxying is way too easy to be exploited in XSS attacks if the parties involved aren’t specifically tailoring their services to be immune to such problems.
And how does Flex as an app platform have anything to do with this at all? Flex is just an easy quick way to markup of flash UI and have it auto-generated.
But to your point about user-permission requests… Yes, there might be some benefit in allowing that to happen. But it definitely degrades user-experience. User’s probably wouldn’t want to have to authorize every dang x-domain communication connection, and most of them wouldn’t even really be informed enough to understand if they should or shouldn’t do it… which would probably lead to retarded default settings and overly-restrictive browser impositions.
The decision should be in the hands of intelligent developers/authors. That’s why I believe server opt-in is a strong, future-thinking model.
October 12th, 2009 at 2:03 pm
I’ve been wanting to do a similar js-based front end for the FICS server. It seems a Java applet is the best solution for doing sockets communication with JS (example: http://stephengware.com/projects/javasocketbridge/) Of course, the downside is you’re using an applet… Would you have any interest in open-sourcing your style13-2/style12 parsers so the rest of us don’t need to reinvent the wheel?
October 13th, 2009 at 2:13 pm
If you can find it, you can have it. All my old public code is MIT license. That was before I started aggregating my code in a public repository ( http://scutil.com/ ), so I’m not actually sure where it is.