PART
2: SUGGESTED SPECIFIC VALUES
PROPOSED PACKETS AND
THROUGHPUT ANALYSIS
Given these constraints, we propose
the following data packet. Each packet would include:
- Egg ID (2 bytes)
- Option information (8-20 bytes, including sample/trial
size, etc.)
- Up to 60 records containing
- Beginning of second timestamp (referenced to Jan 1, 1970
UTC)
- Up to 10 bytes of trial data (because <= 10 trials/sec)
- 16-bit CRC checksum
This gives us a maximum packet size
of less than 1000 bytes, which should work well with most network MTUs (and the number of
records can be reduced to create a packet as small as 38 bytes, which still contains over
25% data).
Limiting the number of records to 60
and using seconds as a time stamp means that we will need to transfer a maximum of one
packet per minute. The first data record sent would be the one indicated by the
last-data-received from the Basket value (this allows some cross-verification that's
probably not necessary), and the final record sent would correspond to the
most-recently-acquired data. An optional connect-time limit could be set to help hold down
local costs (but it would have to be set high enough that all the data would eventually
get through).
We next consider the latencies and
bandwidths of various connection types, and their implications for throughput of data.
Here are some initial estimates:
Conn type |
Conditions |
Latency (Msec) |
Bandwidth (bps) |
WWW |
routing |
30-3000 (100 typ) |
high |
Analog modem |
new connect |
20k |
14.4k |
Analog modem |
est connect |
300 |
14.4k |
ISDN modem |
new connect |
5k |
128k |
ISDN modem |
est connect |
100 |
128k |
Direct Ethernet |
|
<5 |
6M |
Application turnaround |
|
1000 |
|
A full communication scenario
might look like the following. The Egg would dial the server, and tell it it was online.
The server (with some application level latency) would provide the Egg with a packet that
included both configuration information and last-record-received (timestamp) information.
The Egg would then begin sending data packets.
We can now get a fair estimate of
the actual cost in telephone time using this scenario. We model an Egg taking data at 10
trials per second, and dialing in hourly via 14.4kbps modem. Counting in the various
expected latency terms, and ignoring data errors, retries, and compression, we get the
following rough estimates:
Total records sent: |
3600 |
Total data packets sent: |
60 (982 bytes or 750msec each) |
Total ctrl packets sent: |
2 |
Total ack packets sent: |
62 |
Round trip packet+ack latency: |
1550msec |
|
|
Dial time: |
20 sec |
Control conversation: |
5.13 sec |
Data transfer: |
93.01 sec |
Total: |
118.14 sec, 2835.27 sec/day |
(Making the same estimate
based purely on bandwidth would be off by over a factor of four.) Generating the same
numbers for other dial-in frequencies gives:
once/minute |
27 sec/call |
640 min/day |
once/5 minutes |
33 sec/call |
158 min/day |
twice/hour |
72 sec/call |
57 min/day |
once/hour |
118 sec/call |
47 min/day |
four/day |
397 sec/call |
39 min/day |
once/day |
2257 sec/call |
37 min/day |
By dropping the sampling rate
to one trial/sec, the hourly connection time goes down to less than one third: once/hour
(1 trial/sec) 34 sec/call, 14 min/day
These numbers should help give some
idea of the costs likely to be experienced by the remote dial-and-drop sites. Clearly we
need to be sensitive to any costs being imposed on the Egg-site hosts by
"administrative changes"!
BROADER NETWORKING AND
PROTOCOL ISSUES
As part of the networking process,
all the Egg-sites must share a common clock. This is exactly the purpose of the standard
NTP protocol, and its implementation on the expected platform (Linux) appears to be such
that it compensates for both clock value and clock *drift*, so that the resulting
uniformity of clock time is better than the typical one second resolution transferred in
the protocol. We recommend using NTP, with the Baskets serving as second-tier servers from
some other canonical source and with all of the Eggs periodically synchronizing themselves
to the Basket time. There may be a need to designate one Basket as primary, in case of a
discrepancy, however.
I recommend the following
relationship between the permanent and dial-and-drop scenarios. In the case of
dial-and-drop, the Egg will send a packet to the Basket which simply serves to
communicate, "I'm online now." At this point, the Basket will reply with a
packet describing the options and indicating the last successfully received index value.
Once this packet arrives at the Egg, the Egg begins sending data from this point until all
its data has been sent.
In the case of a permanent
connection, the Egg can elect to send an "I'm online now" message at whatever
interval it desires, and the protocol continues as above. The server need never know the
difference between the two types of Eggs. (However, if we prefer, the Basket can send the
"last-received" packet whenever it wishes to collect data. It must then know not
to ask dial-and-drop Eggs, or to expect no response from them.) In either case, the body
of the protocol is identical at both ends, and only the initiation changes.
The Basket should still have
responsibility for monitoring the "aliveness" (fertility?) of particular Eggs,
and notifying human administrators if a particular Egg seems to be down or partitioned off
from the network.
It is probably desirable to be able
to set at least some of the options from the Basket, so that the administration of the
Eggs does not require extensive involvement of the personnel at each Egg-site. However, we
need to have some method of ensuring that the settings are authenticated. As a simple
security mechanism, we can assume that the Eggs will only accept updates from the Basket
they have contacted in the "I'm online now" phase, using known (fixed?) IP
addresses, and assuming the security of the routing tables against corruption. This is
essentially an IP "dial-back" approach. If this is inadequate, it is possible to
implement something like a shared secret DES encryption scheme (ala CHAP) for
authentication. This requires substantially greater sophistication, and may not be
necessary.
To help offset the impact of
ever-more-frequent network partitions, I think it is important for each Egg to know about
all the Baskets. Each Egg may even be configured to prefer a different Basket, on the
assumption that communications within a continent are cheaper or at least more reliable
than transcontinental ones. Thus, a Scandinavian Egg might report to a Dutch Basket, and a
Californian Egg to an New Jerseyan Basket, and in the end only the Baskets would need to
exchange information (presumably over higher bandwidth links) to get the whole picture. In
the event of a trans-atlantic partition, each Basket would still receive data, and each
Egg would still be able to report to its first choice of Basket. In the event of Dutch
Basket down-time (if, say, it was being borrowed by the Easter Bunny) the Scandinavian Egg
would then contact the NJ Basket directly, after noticing the missing Dutch Basket.
LOCAL EGG ISSUES
The Eggs generally should not need
any display as Eggs. However, many of the Egg hosts (people) may want to know what is
going on, and indeed it may be worthwhile to have at least a status display. It could be
just a text report, with indicators of time on, amount of data reported, grand deviation
(as a check whether all is well), etc.
In general we should avoid any
aspect that requires maintenance at the Egg-site. However, some Egg-site maintainers may
be comfortable with extra features that are not appropriate for everyone. These features
can be set locally at the Egg-site, with an appropriate interface that warns the
maintainer of the extra burden being taken on. These features should also be made robust
in the event of inattention.
For example, if the local sites wish
to have a data backup, one possibility is a floppy disk. However, the amount of data
generated at the maximum speed of ten trials/second would roughly fill one floppy disk per
day. This puts a high maintenance burden on the Egg-site maintainer. In contrast, running
at one trial/second would extend the life of a floppy to over ten days, which is probably
a reasonable maintenance burden for most sites. At this rate, this sort of backup adds
less than $20 per site per year of media costs. If the local site maintainer forgets to
change disks, the system should recognize this and either (1) discard the data on the disk
and start from scratch when the disk fills up (2) stop writing data and discard data until
a new disk is available, or (3) stop writing data and queue further writes until a new
disk is available.
We would discourage
non-data-acquisition uses of the Egg machine, such as the installation of a web browser,
because it potentially increases the hardware requirements, competes for bandwidth with
the required data transfers, and interacts in somewhat unpredictable ways with the dialup
scheme. Although most of these are not concerns for permanently-connected Egg-sites, these
sites are the most likely to already have browsing capabilities. Furthermore, keeping the
hardware and software platforms uniform allows for easier "hot spare"
replacement.
LOCAL BASKET ISSUES
Some sort of utilities (probably
software on the Baskets, or perhaps even a private web area) needs to be built to help
view the performance of the network rather than just the results. Things like a global
view of connectivity, down-time ratios, and Egg type information would be useful. Using
SNMP to some extent is certainly possible, but although I would like to encourage the
usage of IETF standards as much as possible, it may be quicker to roll our own details for
this capability.
SOCIAL IMPLICATIONS
If and when our Eggs hatch, we may
need to open certain cans of worms to feed the hatchlings. One might divide the issues
into those related to the project "not working" and those related to it
"working," but of course, terms themselves need to be defined. For now we take
it to mean that the system detects some sort of global consciousness structure.
If it doesn't work, there will be a
need to explain why its results are different from the preliminary studies like the Diana
and Theresa work. What if it does work? It seems that discovering and being able to
measure something like global cohesion is a huge breakthrough, and we should consider how
to communicate the discovery properly. Is there also a moral significance to demonstrating
the power of group-think?
What do we do if we discover that
the mechanism measure other things of significance? Jiri has alluded to the fact that it
could equally well pick up on the consciousness of animals other than humans. If it
notices solar eclipses, it certainly has the potential to notice other significant
astronomical or geological events, or our reactions to them. One thought in particular,
given that animals are often sensitive to things that people miss, is that the system
might detect phenomena such as earthquakes before they actually occur. This possibility
alone, if it came true, would make the project extremely significant to humankind.
ACKNOWLEDGEMENTS