The people  
Coding standards  
Design  
Rewrite rule  
TODO  
DONE  
Sun Java  
Javadocs  
Class Hierarchy  
Index  
NAWS  
MTRX  
security  
servlets  
Skang  
Squeal & Stuff  
zen developer  
MTRX protocol
MTRX - a contraction of MaTRiX, Minimalised Transmit & Receive eXchange, Messy
TRiX, Miserly TRansaction eXecutor, whatever way you want to justify it, call it
MTRX. MTRX is an effort to reduce the number of bytes sent across the wire for
the most often used Internet protocols, thus reducing the bandwidth usage of the
entire Internet. This should result in cheaper access for all, and a quicker,
snappier Internet. Possible side effects are a reduction in CPU usage at both
the client and server ends, and a reduction in file storage requirements for web
servers if they pretokenise.
MTRX protocol will soon be added to matrix-DFS, an open source project written by
the author, as the Transmogrifier module.
One end or the other ID's itself as MTRX aware, then the other iniates the switch.
> 00 03 0123 0234 0543 (RFC 3 x y z) [Switch to MTRX, use these RFC tables.]
< RFC 3 MD5(xTable) MD5(0) MD5(zTable) [I only know these tables.]
> yTable [Here is the missing table.]
< RFC 3 MD5(xTable) MD5(yTable) MD5(zTable) [I now know these tables.]
> YEP! [We agree, let's do it.]
> 00 03 0123 0234 0543 (RFC 3 x y z) [Switch to MTRX, use these RFC tables.]
< RFC 3 MD5(xTable) MD5(0) MD5(zTable) [I only know these tables.]
> VERSION(zTable) yTable [We disagree on what z is, but here is y.]
< VERSION(zTable) [And this is my version. If I have the later version, I can skip this step.]
[Holder of latest version] zTable
[Holder of older version] RFC 3 MD5(xTable) MD5(yTable) MD5(zTable) [I now know these tables.]
[Holder of latest version] YEP! [We agree, let's do it.]
Tables sent during this handshake are only used between these two machines. The one with the missing
or out of date table should get an upto date table from a trusted table server as soon as possible.
We should call table servers "waiter". When listing tables, they are always listed in ascending order.
MTRX reserves the first 16 token codes (80 - 8F) and the first 32 ASCII codes (which are
unprintable anyway). Some of those first ASCII codes are actually reserved for RFC use (CR, LF,
etc). Some of these reserved codes make up the default table, and the default table can depend on
the URL, well known port number, or expected protocol. Only do default tables for the basic tokens
of common protocols, so the "request for tables" code is now several "request for tables, using
this very well known default table". These "very well known default tables" are hard coded into
every MTRX aware client and server, but can be different for different MTRX protocol
versions.
The structure of generic binary values is token, length, data, delimiter. For data that is fixed
length, the length and delimiter is not sent. For data that can handle a delimiter, the length is not sent.
For raw binary data, where there is no delimiter byte available, the length is sent, but obviously no
delimiter. As mentioned below, token can be one or two bytes. Length is one byte, unless that byte is FF,
then the next two bytes are the length. For tokens that represent zero or more repeats of fixed length
data, the length is actually a count. Typical delimited data is ASCII strings, where the delimiter can be
LF, 00, or any token. A token sent as delimiter is interpretted as the token for the next value.
Multiline ASCII data can include CR's and LF's as the data, so exclude them from the set of delimiters
if the value is multiline ASCII. Note that CR was left out of the list of delimiters, if they are sent,
drop them, and don't send them.
For a single connection, the connection table is made up of the agreed tables like this. Starting
from the first unused code (some are reserved for MTRX), each token described in the first
RFC is added to the table, in the order they are described. Then the next RFC has it's tokens added,
starting from the next unused code. Rinse, repeat. If the total number of tokens is less than 128,
then use 8 bit codes. If the total number of tokens is less than 65535 - 128, then use 16 bit codes.
This allows for 24, 32, etc bit codes to be used in the future, if ever needed. There may be instances
where the same token exists in two or more tables, when adding the other tables, search the current big
table for the token before adding it, and don't add it if it already exists. This should scale well,
as some RFC's depend on definitions from earlier RFC's.
RFC's that define a text based wire protocol get assigned their RFC number. External (non-RFC) text
based wire protocols get assigned otherwise unassigned a RFC number per version (that is, those RFC's
that don't define tokenisable protocols). The final 16 table numbers are reserved for experimental /
temporary use (FFFx), if you see them in a request for tables, assume you don't have them, and discard
them after the connection.
If a protocol has an XML DTD, then we should be able to automatically parse the DTD to produce the table.
Servers that store files to be sent across the wire, where the files can be tokenised, should store the
files in a tokenised form, this will save on storage bandwidth and memory usage. We should define a
standard method to keep track of what files are tokenised, and with what version of what table. Link
this with any compressible content scheme the server already has, and let the server decide.
Since the RFC's are immutable, and any particular server will generally expect to only talk a few
protocols (MIME, HTTP, and HTML for example), I expect that the combined tables don't have to be
recaclulated for every connection, several typical versions can be cached. Programs using MTRX
tokenisation can typically save some CPU cycles in parsing the wire protocols. The entire MTRX
protocol tokenisation scheme should scale well.
To help with scaling, hosts should cache detials of table differences with particular IP's, and offer
the different table first next time it negotiates with a host at that particular IP. If the cached
table is different from what is on offer this time around, drop the cached table and renegotiate.
Server tables can be cached for a long time, client tables should only be cached for an hour (possible
dynamic IP).
I should write a TELNET replacement for debugging tokenised connections. I could easily do it as a
Skang module and it would be useful for the initial protocol testing. I should also break the Java
only nature of matrix-DFS and write some generic, high speed C code to support tokenisation. Hmm,
tokenise(char *string) and untokenise(void *bytes, int length), both return a linked list of structures
that have the negotiated table, token, standard table, standard token, value type, void pointer to value,
value as string, token as string, tokenised bytes, length of tokenised bytes, and RT (see below).
I should write them so that partial buffers can be processed. Provide for convenience wrappers
and call back routines.
Reversed file transfers
Two tokens need to be assigned to support reversed file transfers. A reversed
file transfer is any transfer of a file done in reverse, starting from the last
byte in the file, and progressing a byte at a time towards the first byte of
the file. Strange as it may seem there are good reasons for doing this.
Anytime a large, unstructured (as far as the application protocol is aware) blob
of data is transferred, either end may send the "Reversed file transfer" token.
The other end must reply with the "Reversed file transfer" token. This will cause
the very next file transfer to be done in a reversed manner.
Anytime a file server knows that a file is also available from an alternate source,
it should send a "partial redirect" token, with the value being a URL pointing to
the altranate source. The only restrictions on this URL are that the alternate
server must be able to handle reversed transfers, and the file must be identical.
Note that the actual file transfer protocol may be different. The original server
serves the file in the usual way, and the alternate server serves the file in
reverse. When the client notices that the entire file has been sent (the servers
meet in the middle), then it stops both transfers. MD5 checksums and other file
metadata should be used to make sure the files are identical.
My research has shown that a typical file transfer over a dial up modem uses
between one third and two thirds of the total bandwidth, varying between those
during the life of the transfer, generally being peaky. Two file transfers
will saturate the link, while having minimal impact on the transfer speed of
either. On high speed links, one transfer will usually not come close to
saturation (the average speed of the Internet is a little slower than dial up
speeds). Two transfers over a high speed link, from different servers,
typically won't saturate the link, but will double the number of bytes per
second being transferred. I wish to take advantage of these facts to double
the speed of file transfers by use of reverse transfers and partial
redirections. This will also reduce the load on file servers, as each one will
only have to serve half length files.
Since high usage file repositories are often mirrored, this entire process
should result in the load being spread more evenly between mirrors, and users
getting their files at twice the speed for the majority of files transferred.
Two things to support this are outside the scope of this document, modifying
current file transfer protocols to support reverse transfers, and
infrastructure to support the identification of alternate file sources. File
transfer protocols don't need to be modified straight away, as it is expected
that initial deployment will use proxies and wrapper modules to make the
process transparent to both client and server. Infrastructure to support
mirroring is already in place, this just automates the use and standardises
a client - server protocol.
Local proxies will be used to allow legacy browsers to use the MTRX protocol.
The legacy browser sends all requests to it's local proxy. The local proxy
identifies MTRX aware servers and intermediate proxies, tokenises the transfer,
and handles partial redirects. At the server end, wrappers could intercept MTRX
protocol, detokenise it, and pass it on to the real server. For reverse file
transfers, the wrapper could create a temporary file, copy bytes into it in
reverse order, then rewrite the file request to point to the new file, before
passing it on to the real server. Server wrappers could also interface to
whatever mirror system the server operator is using to produce partial redirect
requests. This can be used for automated load balancing, instead of "pick the
nearest mirror" web pages.
matrix-DFS, an open source project written by the author, will provide suitable
local proxies written in very portable Java, and modules for Apache, the worlds
most popular web server. matrix-DFS will also include it's own mirroring
infrastructure.
Reserved tokens
- 00 - ACK, request default table 0, a delimiter.
- 01 - NAK, request default table 1.
- - Reverse the next file transfer.
- - partial redirect, start reverse transfer with this URL.
- 0A - LF, a delimiter.
- - CR, a delimiter.
Effectiveness
Since the whole point of this is to decrease the amount of wire traffic at the application level, some
means of tracking the effectiveness of tokenisation is required.
The RT, ART, and TART metrics measure the number of bytes saved. R stands for "Received", T stands for
"Transmitted, A stands for "Average", and the other T stands for "Total". Thus the RT metric is a
measure of bytes saved for this particular file, the ART metric is the average of the bytes saved
over this connection, and the TART metric is the total bytes saved over many connections.
The format of the metric as a text string is the captial letters "RT", followed by a plus or minus sign,
followed by the bytes saved on the incoming stream, followed by a plus or minus sign, and finally the
bytes saved on the outgoing stream. The first plus sign is optional. Some examples to clarify -
- RT+3221-12 - 3221 bytes saved on incoming stream, 12 bytes more used on outgoing stream.
- RT7434+541 - perfect, saved bytes on both streams.
- RT-152-185 - oops, didn't save bytes on either stream.
- RT+0+12345 - broke even on the incoming stream, but saved heaps on the outgoing stream.
For a binary representation, one of the reserved tokens will be used, followed by two signed
16 bit integers, signed 32 bit integers for ART, and signed 64 bit integers for TART. The table
negotiation is included in the byte count. Systems that log transfers should add the RT for the
transfer to the log. Systems that display transfer details should display the RT for the current
transfer, and the ART since the connection begun. Programs should be written to process log files
and produce a TART from the logged RT values.
Hosts are encouraged to precalculate RT for any particular connection, if at all possible, and if
there is no overall saving of bytes, don't tokenise.
Security
Have we introduced any new security issues? The short answer is no. Lets analyse this.
First the problem of a malicious end point (MEP). Tokenisation works as a temporary mutually agreed
short hand between two systems. Any permenant update to a hosts tables should only come from trusted
hosts. Thus the only vector of attack for a MEP is to lie about the table used. This is the
equivilent of sending meaningless data in an untokenised format. No new security concerns there.
Next is the man in the middle attack (MIM). A MIM could spoof untokenised data as easily as tokenised
data. No new security concerns there.
Finally, buffer overflows and similar malformed data. In both the MEP and MIM cases above, the malicious
party could try to trigger security holes. While tokenisation introduces another level of complexity,
thus introducing more oppurtunities for failure, it can provide an extra measure of data integrity
checking. For the purposes of analysing security concerns introduced by tokenisation, we can safely
ignore security concerns of the underlying untokenised data. Only in the case of a malformed token string
are new security concerns raised. These would be in three forms, underrun, overrun, and unexpected data.
The token could be outside of the range of agreed upon tokens, this can very easily be checked by the
recieving party. This should be treated the same as a malformed untokenised data stream. No new security
concerns.
The length of the value could be wrong, a potential overrun or underrun. Overruns and underruns can
happen in untokenised data streams, the results will be the same, either the other host is left waiting
for more data (underrun), or a buffer is smaller than needed (overrun). No new security concerns.
Missing delimiters can happen in tokenised and untokenised data streams. Due to the nature of the
tokenisation system, delimiters are less likely than in untokenised data, and there is a greater range of
allowed delimiters (the token for the next token string delimits the previous token string). So that is
one problem that is potentially reduced.
This leaves us with the problem of the actual value itself being malformed, for example a dodgy date or
integer, or a meaningless string. Since the untokenised data stream could create such malformed values
just as easily as the tokenised stream, there are no new security concerns here.
The only real security concern is permenant updates to a hosts tables. The proposed solution is to
only used negotiated tables for the current connection, and update permenant tables from a trusted
host. The specification of trusted host protocols is outside the scope of this document. The usual
methods of exchanging secure data with an authenticated host can be used.
Rambling thoughts
I have been having ideas about meta protocols, and I have mentioned this
phrase a few times during the first meeting. I will talk about a meta protocol
in the context of the sking language (Skang) and the BPA login protocol (BIDS2
as used in my login client). These are the test beds I have been using to
develop my ideas, and they are the only publicly available parts of matrix-DFS
so far, so you can refer to the source and documentation to help understand
what I am talking about. These two things are why I am part of the matrix-DFS
project, as you will see, there is a good match between matrix-DFS, and where I am
headed with my login client.
Skang was always on the cards to be exactly what it is now, and what we need
it to become to support runestones ideas of what a GUI should look like. I am
lucky that I had some time before a (paying) project that needed it to develop
Skang, and that the boss also asked me to redo the GUI for an old project, so
I re did it in Skang, and told him he could edit the Skang himself. He knows
that Skang is open source, I used that as a selling point B-). He was worried
about people stealing applets, since like all things sent across the web, it
is trivial to steal. If all the applets I write for him in the future are very
thin extensions of Skang, then there is nothing to steal, as it is open source
anyway B-). The actual bussiness logic he pays me for is running on his
server, where it is harder to steal.
Actually, I am writing this now cause it has a hold of my head and won't let
me get to sleep, so this is more of a ramble. I'm an insomniac that can be
kept awake by a good challenging project rattling around inside my head.
Sometimes I just have to get up after a few hours of not sleeping, just so I
can get these ideas out of my head and into a computer somewhere. Since I have
a meeting later today, I will have to live on caffiene for the rest of the day.
A meta protocol is a top level protocol that can handle many different ordinary
protocols. In this case, it should be able to handle and manage the usual
internet protocols (TFTP, FTP, HTTP, NNTP, SMTP, POP, talk, etc) and whatever
more efficient protocols we decide to design in the future.
One of the useful concepts that has come from "The cathedral and the bazaar"
and other related works on open source software is that you should expect to
throw one away, and if you expect to throw one away, you will throw two away.
This boils down to the fact that we should feel free to experiment, rather
than designing things and setting them in stone forever. This is the method I
have been using with Skang, and the method I intend to use to fold the BIDS2
into Skang.
Skang is a text based protocol, although more of a langauge that can be
network based rather than a protocol. I have made lots of noise in the past
about how most of the internet is based on text protocols, and that it is
wasteful. However, we do have to keep in mind peoples inertia when it comes
time to convince the world to do it our way though. The only good reason to
use text protocols is that they are human readable, experts can telnet to any
HTTP port and just start typing HTTP commands, then read the result.
The people that are now used to using text protocols can go on
doing so, but when it is one computer talking directly to another, they should
shift to binary and save bandwidth.
BIDS2 is a binary protocol, but it does waste some space on specifying things
like the length of a 16 bit integer. It is extendable though, and for our
purposes, we can feel free to deviate from it. Skang is also extendable.
For the binary protocol, I shall describe the basics of BIDS2. All integers
are big endian (network order, high byte first). Since all parameters include
a length, strings do not have an end byte (typically 0). A message consists of
16 unsigned bits of message type, 16 unsigned bits of message length, the
session ID, and optional parameters. A parameter consists of 16 unsigned bits
of parameter type, 16 unsigned bits of parameter length, and the actual
parameter.
Parameters can be one of 16 bit integer, 32 bit integer, UTF8 string, and byte
array. I just had a thought, if we use byte arrays to send more complex
structures, then we are not left with the problem of constantly sending 16 bits
of information that describe the length of integers. Then we can stick to the
BIDS2 protocl, and reuse all the code in my login client. The parameter types
are things like "user name", "password", "port", etc, so that if 23 means
"port", then the parameter type is 23, the length is 2, and the parameter is
16 bits represemting the port number.
The very first thing that BIDS2 does is to negotiate the protocol to use.
BIDS2 only has one protocol. I have extended it already by adding the usage
meter protocol I have discussed extensively on BPC newsgroups. That can be
used as an example of the protocol in use.
The session ID and lengths are there as part of the security and reliability,
and we can keep using them for the same purpose. We could even use the BIDS2
protocol directly as our own login protocol B-). That is precisely what I was
planning on doing for the Games@BPC server tracker, so I have already designed
my login client to support that. The idea is that after the BIDS2 protocol has
been used to log the game server onto BPC, it uses exactly the same protocol
to login to the tracking server, and then uses an extension of BIDS2 to send
some details (what game and mods, current IP, how many players, ping times,
etc).
Since the BIDS2 protocol has provision to notice when a user drops out, and I
have already added code to check the health of the network, very little code
needed to be added to the client, I just needed to write the server.
The same deal for my automatic free sites update and checking, use BIDS like
protocols to connect to a server run by Eric to grab the free sites list he
keeps up to date. I also was thinking of a reality checker, feed the usage
report through the login client, and it compares it to the free sites list, if
there are free sites listed, output those in a format that can be sent to Eric
so he can update the list, or the user can use the output to get a refund.
Since the login client was running the usage meter protocol to display the
usage sent from a background usage collector (possibly running on another
machine, cause Skang can run as an applet), you can see how I was planning on
tieing this all together B-). Login client feeding free sites updates to usage
collecters and other fun stuff.
Not to mention that I have some Java code from another BPC user specifically so
I could include it in the login client to update DNS servers when IP's change;
I found that my login client was much more robust than DHCP clients, so I was
planning on adding DHCP client stuff; and the network health checker works by
testing things like web page last modified date, and whether or not you have
new mail (if it can't contact ALL the servers you want checked, then the
network MUST be down), and doing all this a lot quicker than BIDS2 will notice
that something is wrong (can take half an hour for some other clients, mine
notices in a few minutes, and I know how to get that time down to less than a
minute). BPC Network Organizer (BPC_NO) was gonna be a complete package.
Now most of that functionality will get folded into the matrix-DFS project (I
doubt if there will be any free sites any more, so we no longer need that). I
have been thinking this through all year. I was also gonna add a MB donation
system to matrix-DFS, to complement the FTP mirror project (tell your matrix-DFS
client that you want to donate 20MB to the mirrors this month, it takes care
of the rest), I guess that's some more code we don't have to worry about
writing B-(.
Hmm, since we should make people put up a server for any protocol they wish to
define a table for (so there is a reference server), we can have their server be
the master server for their table, and automatically increment the version
number when they change it. They pay for the ownership of the table name, and
the security needed for them to be a master server. We can use x-UENP (Unknown
Experimental Net Protocol) in the same way it is used for MIME and email headers,
anybody can design a protocol table beginning with "x-" and use it for
experimental work without the need to pay a fee, just don't be surprised if
someone else uses the same table name. Since we want to be RFC friendly, we can
reserve the names beginning with RFC for tables we make ourselves that support
the official RFC specified protocols. We should then try to implement ALL the
RFC's as appropriate, even the mattar transmission one. Note, this is the tables
I'm talking about, not the protocols. In other words, we will write a matter
transmitter table to compress it's data stream, but we won't bother writing any
code to transmit matter B-).
A typical text protocol will advertise it's name and version number as part of
the initial handshake. Some of them allow other text on the same line, or some sort
of extra text on another line. We could use "HTTP/1.0 200 OK" followed by
"Server: MTRX/2.3" to mean that version 1.0 of the HTTP protocol as extended
by version 2.3 of the matrix-DFS meta-protocol (MTRX) can be used by this server. If the
client's reply includes "MTRX/1.9" then the server must use version 1.9 of the
matrix-DFS meta-protocol, but can suggest to the client that it get upgraded, with
a possible (user informed) auto-upgrade. If there is no "MTRX/?.?" in the reply,
the client is not MTRX-aware and the server has to talk ordinary HTTP v1.0 to
this client. MTRX aware servers could include a link on their home page that points to
the matrix-DFS site for non-MTRX-aware clients. "This site best veiwed with
matrix-DFS." There's a job for you runestone, design the graphic for that link.
This file was last modified on Sunday, 19-Sep-2004 03:36:41 EST