Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!gamma!epsilon!zeta!sabre!petrus!
bellcore!decvax!decwrl!pyramid!pesnta!amd!amdcad!cae780!leadsv!rtgvax!ramin
From: ramin@rtgvax.UUCP
Newsgroups: net.news,net.news.adm,net.unix-wizards
Subject: usenet volume problems...
Message-ID: <74@rtgvax.UUCP>
Date: Fri, 6-Jun-86 01:39:58 EDT
Article-I.D.: rtgvax.74
Posted: Fri Jun  6 01:39:58 1986
Date-Received: Sun, 8-Jun-86 05:12:36 EDT
Followup-To: net.news
Organization: Erewhon Travel
Lines: 92
Keywords: ...one solution
Xref: watmath net.news:4969 net.news.adm:778 net.unix-wizards:18317

[don't give up...]

Noticing a recent resurgence of discussion on over-loaded usenet traffic,
I thought I might suggest a solution that I've been playing with for a
while. It might help solve this issue and other network traffic problems...

Now as I understand it, every bit of mail (up to several Mbytes daily)
gets shipped around to various sites around the world. Of those that I
have been looking at carefully recently, a very large number include either
cross-postings or follow-ups to older articles with large quantities
of material quoted.

The readnews software that I use here shows only one copy of a cross-posted
message and I'm not sure if the news software is intelligent enough about
actually *sending* multiple copies or not... For the sake of argument I'll
assume it doesn't (and if it does, there's one place things can be tightened
up) but the issue is that the actual *volume* of quotes in follow-ups
constitute a pretty major factor in the traffic problem.

Given such a premise it appears that one solution might be to avoid sending
multiple copies of these follow-ups around. The solution immediately
coming to mind is to have the posting software replace all quoted
regions with a context diff type notation, except with each difference marking
not only the beginning and ending lines (or bytes) but including the
article-id, i.e. (<####@xyz.UUCP>12,14) instead of the full text-line.
The larger the quote, the more efficient the scheme.

The *de-quoting* step would be taken after the user has edited the file
(in "postnews" after typing *send*).  Now the news-reader software would
access the file as usual, but whenever a certain escape character
is reached in the stream it would be an indicator to the news software that it
should go to the history file to get the article number so and so from lines
a to b (or bytes a to b).
Also, readnews could probably cache the file pointers to save on overhead of
reopening the file in a single message, or even in a single reading session.
It would then place the suitable indentation markers and proceed with the
rest of the message. Naturally, to allow for punsters and pundits alike (:-),
the included text would be searched for such escape sequences to allow for
nested follow-ups (to a pre-defined depth with a strategy of tossing out
the top-levels of a nesting, i.e. the oldest quote gets shoved out and
replaced by the most recent quotes).

The only problem here is obviously that the quotes cannot extend beyond
the date news is expired. The solution here can be to expire selective
newsgroups over longer periods (i.e. technical ones) and the ones
currently proposed in the talk groups with shorter periods... Since the
life of an article and its relevance would generally depend on the expiration
cycle of the newsgroup (most places I've talked to keep it at around 2 weeks)
the necessity for temporary groups such as net.politics.terror would be
eliminated since discussions should generally run-down shortly after
most of the quoted articles expire.

For important matters, a mechanism could be implemented to allow users
(or the system manager on their behalf) to request a copy of
an expired article from a central archive to access a quote beyond the 
expiration date of that article...

The only other issue I can think of is, obviously, the processing overhead.
If there is enough interest in this solution I will try to come up with
some figures based on average news traffic loads and amounts of followups
across various groups (plus file access and read times, etc...) to come up
with a rough estimate for the trade-off...

I think this solution is particularly fitting for net news where the
variations in the format of the contents of a given transmission is
finite and lexically determinable. One can generalize this *active*
transmission approach to other networks (i.e. compress selectively based
on the contents of the message). But I think one of the
problems of the news network is its passive approach (i.e. send down
everything, everywhere).  Another way to look at this is that the
"References:" field currently applies only at file-level granularity.
This would take that to line-level (or even byte-level)...

I apologize for the length of this note, but I thought an extended
elaboration would help present the solution better... I've addressed all
followups to *net.news* to avoid clogging other groups. I've also
included net.unix-wizards since other people there would be more
familiar with the guts of the news system and the applicability of
this suggestion...


Looking forward to hearing people's thoughts on this...

ramin

-- 
=--------------------------------------=-------------------------------------=
: alias: ramin firoozye'               :   USps: Systems Control Inc.        :
: uucp: ...!shasta \                   :         1801 Page Mill Road         :
:       ...!lll-lcc \                  :         Palo Alto, CA  94303        :
:       ...!ihnp4    \...!ramin@rtgvax :   ^G:   (415) 494-1165 x-1777       :
=--------------------------------------=-------------------------------------=

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!henry
From: henry@utzoo.UUCP (Henry Spencer)
Newsgroups: net.news
Subject: Re: usenet volume problems...
Message-ID: <6803@utzoo.UUCP>
Date: Sun, 15-Jun-86 03:12:52 EDT
Article-I.D.: utzoo.6803
Posted: Sun Jun 15 03:12:52 1986
Date-Received: Sun, 15-Jun-86 03:12:52 EDT
References: <74@rtgvax.UUCP>
Organization: U of Toronto Zoology
Lines: 60
Keywords: ...one solution

> ...the actual *volume* of quotes in follow-ups
> constitute a pretty major factor in the traffic problem.
> ... The solution immediately
> coming to mind is to have the posting software replace all quoted
> regions with a [reference to the original article]

This is, in general, a fairly good idea.  Unfortunately, it doesn't work
very well.  The problem is that articles do not necessarily reach all
sites in the same order or at the same time.  It is quite possible for
the "original" article to reach a site some while after the followup.
This is the origin of the stupid "Orphaned Response" swill from notes,
which attempts to pretend that there is a uniform ordering of articles
across the network.  Alternatively, it is possible for the followup to
arrive quite a while after the original, so the original has already
expired.

> The only problem here is obviously that the quotes cannot extend beyond
> the date news is expired. The solution here can be to expire selective
> newsgroups over longer periods (i.e. technical ones) and the ones
> currently proposed in the talk groups with shorter periods...

Many sites are already selective about expiry dates, but for other reasons.
Small sites often simply cannot afford to keep even the technical groups
around for two weeks or so; they don't have the disk space.  Other sites
consider the disk-space/benefit ratio of the noisy groups too low to keep
them around.  Note that the noisy groups would be the ones that would
really benefit from the quote-compression scheme, so it's got to work for
them to make it worthwhile.

> Since the life of an article and its relevance would generally depend on
> the expiration cycle of the newsgroup... the necessity for temporary groups
> such as net.politics.terror would be eliminated since discussions should
> generally run-down shortly after most of the quoted articles expire.

This is a conjecture, not an obvious fact.  Also, the purpose of the
temporary groups is not to shorten discussions, but to move particularly
noisy discussions out of ordinary groups.  It's all very well to say that
expiry times would limit the life of the discussion, but that doesn't
address the problem of newsgroup pollution *during* the discussion.

> For important matters, a mechanism could be implemented to allow users
> (or the system manager on their behalf) to request a copy of
> an expired article from a central archive to access a quote beyond the 
> expiration date of that article...

Who's going to maintain that archive?  Please understand that this is *not*
a trivial undertaking.  The accumulated archives of Usenet since we joined
it (quite early) total something like 1.2 gigabytes, and the rate of
accumulation is rising steadily.  Nobody is going to keep that mass online,
and nobody is going to want to mount tapes for the sake of such requests.
Just keeping the last couple of months would suffice, but even that means
something like 100 megabytes of news.  If you are volunteering to be the
archive site, fine, but please don't assume that other volunteers will
leap forward.  We are very well supplied with people proposing nifty ideas
for *somebody* *else* to implement.  We "somebody elses" are getting very
tired of this.
-- 
Usenet(n): AT&T scheme to earn
revenue from otherwise-unused	Henry Spencer @ U of Toronto Zoology
late-night phone capacity.	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

			  SCO's Case Against IBM

November 12, 2003 - Jed Boal from Eyewitness News KSL 5 TV provides an
overview on SCO's case against IBM. Darl McBride, SCO's president and CEO,
talks about the lawsuit's impact and attacks. Jason Holt, student and 
Linux user, talks about the benefits of code availability and the merits 
of the SCO vs IBM lawsuit. See SCO vs IBM.

Note: The materials and information included in these Web pages are not to
be used for any other purpose other than private study, research, review
or criticism.