Scalable FD Management using Read-Copy-Update

Path: supernews.google.com!sn-xit-03!sn-uk-xit-01!supernews.com!diablo.theplanet.net!newsfeed.skycache.com!Cidera!cpk-news-hub1.bbnplanet.com!news.gtei.net!newshub2.home.com!news.home.com!newshub1-work.home.com!gehenna.pell.portland.or.us!nntp-server.caltech.edu!nntp-server.caltech.edu!mail2news96
Newsgroups: mlist.linux.kernel
Date: 	Mon, 9 Apr 2001 20:13:11 +0530
From: Maneesh Soni <smane...@in.ibm.com>
X-To: lse tech <lse-t...@lists.sourceforge.net>
X-Cc: lkml <linux-ker...@vger.kernel.org>, Paul.McKen...@us.ibm.com, a...@suse.de,
        haw...@engr.sgi.com, dipan...@sequent.com
Subject: [RFC][PATCH] Scalable FD Management using Read-Copy-Update
Message-ID: <linux.kernel.20010409201311.D9013@in.ibm.com>
Reply-To: smane...@in.ibm.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Approved: n...@nntp-server.caltech.edu
Lines: 980


Scalable FD Management Using Read-Copy-Update
---------------------------------------------

This patch provides a very good performance improvement in file 
descriptor management for SMP linux kernel on a 4-way machine with the 
expectation of even higher gains on higher end machines. The patch uses the 
read-copy-update mechanism for Linux, published earlier at the sourceforge site 
under Linux Scalablity Effort project.
       http://lse.sourceforge.net/locking/rclock.html.

In SMP kernel the performance is limited due to reader-writer lock taken during 
various calls using files_struct. Majority being in the routine fget(). Though 
there is no severe  contention for files->file_lock as the files_struct is a 
per task data structure but enough performance penalties have to be paid while 
even acquairing the read lock due to the bouncing lock cache line when multiple 
clones share the same files_struct. This was pointed out by John Hawkes in his 
posting to lse-tech mailing list 
       http://marc.theaimsgroup.com/?l=lse-tech&m=98235007317770&w=2 

The improvement in performance while runnig "chat" benchmark 
(from http://lbs.sourceforge.net/) is about 30% in average throughput.
For both configurations the results are compared for base kernel(2.4.2) 
and base kernel with files_struct_rcu-2.4.2-0.1.patch. Profiling results were 
also collected by using SGI's kernprof utility and that shows a considerable 
decrease in amount of time spent in fget(). The "chat" benchmark was run with
rooms=20 and messages=500. For each configuration, the test was run for 50 times
and average of throughput result in terms of messages per second was taken.

1. 4-way PIII Xeon 700 MHz with 1MB L2 Cache and 1GB RAM 
========================================================

Chat benchmark results
----------------------	
Kernel version  			Average 
                                        Throughput

2.4.2					191986 
2.4.2+file_struct_rcu-2.4.2-0.1.patch   253083

		Improvement = 31.8%

kernprof results
---------------

Kernel Version - 2.4.2 

default_idle [C01071EC]: 150696
schedule [C0112EE4]: 105452
__wake_up [C0113518]: 74030
tcp_sendmsg [C020FB14]: 29201
fget [C013436C]: 16318
__generic_copy_to_user [C023A13C]: 15477
USER [C0121DF4]: 12925
tcp_recvmsg [C0210A68]: 7737
system_call [C0109150]: 7399
mcount [C023A4E4]: 5509

Kernel version  - 2.4.2+files_struct_rcu-2.4.2+0.1.patch

schedule [C0113174]: 101392
__wake_up [C01137D4]: 68182
default_idle [C01071EC]: 32833
tcp_sendmsg [C021ECE4]: 29318
__generic_copy_to_user [C024930C]: 15472
USER [C0122A00]: 12803
tcp_recvmsg [C021FC38]: 8170
system_call [C0109150]: 7636
mcount [C02496B4]: 6150
fget [C0134F58]: 5694

With this patch the routine fget() gets about 65% less hits.

2. 2-way PIII Xeon 700 MHz with 1MB L2 Cache and 1GB RAM 
========================================================

Chat benchmark results
----------------------	
Kernel version  			Average 
                                        Throughput

2.4.2					209592 
2.4.2+file_struct_rcu-2.4.2-0.1.patch   222729

		Improvement = 6.2%

kernprof results
-----------------
Kernel Version - 2.4.2

schedule [C0112EE4]: 37583
tcp_sendmsg [C020FB14]: 23381
default_idle [C01071EC]: 20025
__wake_up [C0113518]: 16141
fget [C013436C]: 12733
USER [C0121DF4]: 12474
__generic_copy_to_user [C023A13C]: 12046
tcp_recvmsg [C0210A68]: 7678
system_call [C0109150]: 7113
mcount [C023A4E4]: 4789

Kernel version  - 2.4.2+files_struct_rcu-2.4.2+0.1.patch

default_idle [C01071EC]: 39086
schedule [C0113174]: 37278
tcp_sendmsg [C021ECE4]: 23148
__wake_up [C01137D4]: 15788
USER [C0122A00]: 12322
__generic_copy_to_user [C024930C]: 12138
tcp_recvmsg [C021FC38]: 7615
system_call [C0109150]: 7481
mcount [C02496B4]: 5310
fget [C0134F58]: 4858

The results with 4-way are  much better than 2-way, which shows the need for
this patch even more necessary for higher end SMP systems. Results for 8-way 
will be published soon.

With this patch the updates to the files_struct, mainly file expanding the 
fd_array and expanding bitmaps, are done using callback mechanism. In the 
routine expand_fd_array() once the new array has been allocated, updated with
the old entries and is linked to files_struct, a callback is registered to free 
the old fd_array, so that if there are any existing users present in the kernel,
they will keepon using the old fd_array, and for a new user will see the 
expanded fd_array. Once quiescent state is reached the callback is fired and the
old fd_array is freed. On similar lines the routine expand_fd_set has been 
modified to use the callback to free old bitmap.

Known Race condition
--------------------
There could be a race condition involving sys_close() and fget() if both are 
running for the same files_sturct on different CPUs. Though this problem has not
occured so far but this is always possible theoritically and it is being worked 
out. The patch with solution for this will soon follow.


Usage Information
-----------------
The patch is built on 2.4.2 kernel. Before applying this patch, the user has to
obviously apply the read-copy-update patch (rclock-2.4.2-0.1.patch) for linux, 
which can be obtained from
       http://lse.sourceforge.net/locking/rclock.html.

The config options required for this patch are CONFIG_RCLOCK and CONFIG_FD_RCU. 




Regards,
Maneesh

-- 
Maneesh Soni <smane...@in.ibm.com>
IBM Linux Technology Center,
IBM Software Lab, Bangalore, India


Patch

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-02!supernews.com!news.tele.dk!134.222.94.5!npeer.kpnqwest.net!EU.net!Norway.EU.net!uninett.no!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Wed, 11 Apr 2001 18:29:30 -0700
From: Anton Blanchard <an...@samba.org>
To: Maneesh Soni <smane...@in.ibm.com>
Cc: lse tech <lse-t...@lists.sourceforge.net>,
        lkml <linux-ker...@vger.kernel.org>, Paul.McKen...@us.ibm.com,
        a...@suse.de, haw...@engr.sgi.com, dipan...@sequent.com
Subject: Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update
Original-Message-ID: <20010411182929.A16665@va.samba.org>
Original-References: <20010409201311.D9...@in.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010409201311.D9013@in.ibm.com>; from smaneesh@in.ibm.com on Mon, Apr 09, 2001 at 08:13:11PM +0530
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Thu, 12 Apr 2001 01:33:43 GMT
Message-ID: <fa.f7vqtlv.t58lb2@ifi.uio.no>
References: <fa.dsd3bnv.ul04gb@ifi.uio.no>
Lines: 19


> This patch provides a very good performance improvement in file 
> descriptor management for SMP linux kernel on a 4-way machine with the 
> expectation of even higher gains on higher end machines. The patch uses the 
> read-copy-update mechanism for Linux, published earlier at the sourceforge
> site under Linux Scalablity Effort project.
>        http://lse.sourceforge.net/locking/rclock.html.

Good stuff!

It would be interesting to try a filesystem benchmark such as dbench. On
a quad PPC fget was chewing up more than its fair share of cpu time.

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!sn-uk-xit-01!supernews.com!diablo.theplanet.net!colt.net!news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Original-Date: 	Thu, 12 Apr 2001 21:13:54 +0530
From: Maneesh Soni <smane...@in.ibm.com>
To: Anton Blanchard <an...@samba.org>
Cc: tri...@samba.org, lkml <linux-ker...@vger.kernel.org>,
        lse tech <lse-t...@lists.sourceforge.net>
Subject: Re: [Lse-tech] Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update
Original-Message-ID: <20010412211354.A25905@in.ibm.com>
Reply-To: smane...@in.ibm.com
Original-References: <20010409201311.D9...@in.ibm.com> <20010411182929.A16...@va.samba.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <20010411182929.A16665@va.samba.org>; from anton@samba.org on Wed, Apr 11, 2001 at 06:29:30PM -0700
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
Organization: Internet mailing list
Date: Thu, 12 Apr 2001 15:44:13 GMT
Message-ID: <fa.c37hlev.12j2gjt@ifi.uio.no>
References: <fa.f7vqtlv.t58lb2@ifi.uio.no>
Lines: 58

On Wed, Apr 11, 2001 at 06:29:30PM -0700, Anton Blanchard wrote:
> 
> > This patch provides a very good performance improvement in file 
> > descriptor management for SMP linux kernel on a 4-way machine with the 
> > expectation of even higher gains on higher end machines. The patch uses the 
> > read-copy-update mechanism for Linux, published earlier at the sourceforge
> > site under Linux Scalablity Effort project.
> >        http://lse.sourceforge.net/locking/rclock.html.
> 
> Good stuff!
> 
> It would be interesting to try a filesystem benchmark such as dbench. On
> a quad PPC fget was chewing up more than its fair share of cpu time.
> 
> Anton

Hello Anton,

Thank you for your suggestion. I tried dbench for on a 4-way PIII Xeon box, with
1MB L2 Cache and 1GB of RAM. I ran it on 8 GB ext2 partition with Adaptec 7896 
SCSI controller. I ran "dbench 100" and "dbench 200" for five times and took the
average of the throughput and found that for 

Base (2.4.2) - 
        100 Average Throughput = 39.628  MB/sec
        200 Average Throughput = 22.792  MB/sec

Base + files_struct patch - 
        100 Average Throughput = 39.874 MB/sec
        200 Average Throughput = 23.174 MB/sec  

I found this value quite less than the one present in the README distributed
with dbench tarball. I think the numbers in the README were for a similar 
machine but with 2.2.9 kernel.

As you can see the performance with files_struct patch is almost same as base. 
I feel I am hitting some bottleneck other than fget() in both base and the 
patched versions. I think atleast for base version I should get similar numbers
as mentions in the README for similar configuration. Though I intend to do some 
profiling to look into this but it will be helpfull if you can tell me if there 
is some known thing regarding this.

I am copying this to Andrew also, if he can also help. Also if you have some
dbench numbers from 2.4.x kernel, please let me have a look into those also.

Thank you,
Maneesh

-- 
Maneesh Soni <smane...@in.ibm.com>
IBM Linux Technology Center,
IBM Software Lab, Bangalore, India
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!news-xfer.nuri.net!enews.sgi.com!news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Thu, 12 Apr 2001 08:51:18 -0700
From: Anton Blanchard <an...@samba.org>
To: Maneesh Soni <smane...@in.ibm.com>
Cc: tri...@samba.org, lkml <linux-ker...@vger.kernel.org>,
        lse tech <lse-t...@lists.sourceforge.net>
Subject: Re: [Lse-tech] Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update
Original-Message-ID: <20010412085118.A26665@va.samba.org>
Original-References: <20010409201311.D9...@in.ibm.com> <20010411182929.A16...@va.samba.org> <20010412211354.A25...@in.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010412211354.A25905@in.ibm.com>; from smaneesh@in.ibm.com on Thu, Apr 12, 2001 at 09:13:54PM +0530
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Thu, 12 Apr 2001 15:55:41 GMT
Message-ID: <fa.f7foudv.tlakj9@ifi.uio.no>
References: <fa.c37hlev.12j2gjt@ifi.uio.no>
Lines: 42

 
Hi,

> Base (2.4.2) - 
>         100 Average Throughput = 39.628  MB/sec
>         200 Average Throughput = 22.792  MB/sec
> 
> Base + files_struct patch - 
>         100 Average Throughput = 39.874 MB/sec
>         200 Average Throughput = 23.174 MB/sec  
>          
> I found this value quite less than the one present in the README distributed
> with dbench tarball. I think the numbers in the README were for a similar 
> machine but with 2.2.9 kernel.

If you guestimate that each dbench client uses about 20M of RAM then dbench
100 has no chance of remaining in memory. Once you hit disk then spinlock
optimisations are all in the noise :) Smaller runs (< 30) should see 
it stay in memory.

Also if you turn of kupdated (so old buffers are not flushed out just
because they are old) and make the VM more agressive about filling
memory with dirty buffers then you will not hit the disk and then
hopefully the optimisations will be more obvious.

killall -STOP kupdated
echo "90 64 64 256 500 3000 95 0 0" > /proc/sys/vm/bdflush

Remember to killall -CONT kupdated when you are finished :)

> I am copying this to Andrew also, if he can also help. Also if you have some
> dbench numbers from 2.4.x kernel, please let me have a look into those also.

The single CPU 333MHz POWER 3 I was playing with got 100MB/s when not
touching disk.

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!logbridge.uoregon.edu!news.maxwell.syr.edu!newshub2.home.com!news.home.com!newshub1-work.home.com!gehenna.pell.portland.or.us!nntp-server.caltech.edu!nntp-server.caltech.edu!mail2news96
Newsgroups: mlist.linux.kernel
Date: 	Mon, 16 Apr 2001 12:16:25 -0400 (EDT)
From: Mark Hahn <h...@coffee.psychology.mcmaster.ca>
X-To: lkml <linux-ker...@vger.kernel.org>
Subject: Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update
Message-ID: <linux.kernel.Pine.LNX.4.10.10104161140190.7022-100000@coffee.psychology.mcmaster.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Approved: n...@nntp-server.caltech.edu
Lines: 18

> The improvement in performance while runnig "chat" benchmark 
> (from http://lbs.sourceforge.net/) is about 30% in average throughput.

isn't this a solution in search of a problem?
does it make sense to redesign parts of the kernel for the sole
purpose of making a completely unrealistic benchmark run faster?

(the chat "benchmark" is a simple pingpong load-generator; it is
not in the same category as, say, specweb, since it does not do *any*
realistic (nonlocal) IO.  the numbers "chat" returns are interesting,
but not indicative of any problem; perhaps even less than lmbench
components.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!logbridge.uoregon.edu!news-peer.gip.net!news.gsl.net!gip.net!newshub2.rdc1.sfba.home.com!news.home.com!newshub1-work.home.com!gehenna.pell.portland.or.us!nntp-server.caltech.edu!nntp-server.caltech.edu!mail2news96
Newsgroups: mlist.linux.kernel
Date: 	Tue, 17 Apr 2001 14:58:09 +0530
From: Dipankar Sarma <dipan...@sequent.com>
X-To: Mark Hahn <h...@coffee.psychology.mcmaster.ca>
X-Cc: lkml <linux-ker...@vger.kernel.org>
Subject: Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update
Message-ID: <linux.kernel.20010417145809.A21310@in.ibm.com>
Reply-To: dipan...@sequent.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Approved: n...@nntp-server.caltech.edu
Lines: 46

Hi Mike,

[I am not sure if my earlier mail from lycos went out or not, if
it did, I apologize]

On Mon, Apr 16, 2001 at 12:16:25PM -0400, Mark Hahn wrote:
> > The improvement in performance while runnig "chat" benchmark 
> > (from http://lbs.sourceforge.net/) is about 30% in average throughput.
> 
> isn't this a solution in search of a problem?
> does it make sense to redesign parts of the kernel for the sole
> purpose of making a completely unrealistic benchmark run faster?

Irrespective of the usefulness of the "chat" benchmark, it seems
that there is a problem of scalability as long as CLONE_FILES is
supported. John Hawkes (SGI) posted some nasty numbers on a
32 CPU mips machine in the lse-tech list some time ago.


> 
> (the chat "benchmark" is a simple pingpong load-generator; it is
> not in the same category as, say, specweb, since it does not do *any*
> realistic (nonlocal) IO.  the numbers "chat" returns are interesting,
> but not indicative of any problem; perhaps even less than lmbench
> components.)

"chat" results for large numbers of CPUs is indicative of a problem -
if a large number of threads share the file_struct through
CLONE_FILES, the performance of the application will deteriorate
beyond 8 CPUs (going by John's numbers). It also indicates how
sensitive can performance be to write access of shared-memory
locations like spin-waiting locks.


Thanks
Dipankar
-- 
Dipankar Sarma  (dipan...@sequent.com)
IBM Linux Technology Center
IBM Software Lab, Bangalore, India.
Project Page: http://lse.sourceforge.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!newsfeed.wirehub.nl!cyclone-sjo1.usenetserver.com!news-out-sjo.usenetserver.com!e420r-sjo4.usenetserver.com!news-out.usenetserver.com!newshub2.rdc1.sfba.home.com!news.home.com!newshub1-work.home.com!gehenna.pell.portland.or.us!nntp-server.caltech.edu!nntp-server.caltech.edu!mail2news96
Newsgroups: mlist.linux.kernel
Date: 	Tue, 17 Apr 2001 12:59:06 -0400 (EDT)
From: Mark Hahn <h...@coffee.psychology.mcmaster.ca>
X-To: lkml <linux-ker...@vger.kernel.org>
Subject: Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update
Message-ID: <linux.kernel.Pine.LNX.4.10.10104171257310.9534-100000@coffee.psychology.mcmaster.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Approved: n...@nntp-server.caltech.edu
Lines: 20

> > isn't this a solution in search of a problem?
> > does it make sense to redesign parts of the kernel for the sole
> > purpose of making a completely unrealistic benchmark run faster?
> 
> Irrespective of the usefulness of the "chat" benchmark, it seems
> that there is a problem of scalability as long as CLONE_FILES is
> supported. John Hawkes (SGI) posted some nasty numbers on a
> 32 CPU mips machine in the lse-tech list some time ago.

that's not the point.  the point is that this has every sign of 
being premature optimization.  the "chat" benchmark does no work,
it only generates load.  and yes, indeed, you can cause contention
if you apply enough load in the right places.  this does NOT indicate
that any real apps apply the same load in the same places.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!cyclone-sjo1.usenetserver.com!news-out-sjo.usenetserver.com!news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Original-Date: 	Tue, 17 Apr 2001 16:19:16 +0530
From: Maneesh Soni <smane...@in.ibm.com>
To: Anton Blanchard <an...@samba.org>
Cc: Paul.McKen...@us.ibm.com, dipan...@sequent.com,
        lkml <linux-ker...@vger.kernel.org>,
        lse tech <lse-t...@lists.sourceforge.net>
Subject: Re: Scalable FD Management ....
Original-Message-ID: <20010417161916.A11419@in.ibm.com>
Reply-To: smane...@in.ibm.com
Original-References: <20010409201311.D9...@in.ibm.com> <20010411182929.A16...@va.samba.org> <20010412211354.A25...@in.ibm.com> <20010412085118.A26...@va.samba.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <20010412085118.A26665@va.samba.org>; from anton@samba.org on Thu, Apr 12, 2001 at 08:51:18AM -0700
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
Organization: Internet mailing list
Date: Tue, 17 Apr 2001 10:48:57 GMT
Message-ID: <fa.c27nk6v.16j0jrj@ifi.uio.no>
References: <fa.f7foudv.tlakj9@ifi.uio.no>
Lines: 88

On Thu, Apr 12, 2001 at 08:51:18AM -0700, Anton Blanchard wrote:
>  
> Hi,
> 
> > Base (2.4.2) - 
> >         100 Average Throughput = 39.628  MB/sec
> >         200 Average Throughput = 22.792  MB/sec
> > 
> > Base + files_struct patch - 
> >         100 Average Throughput = 39.874 MB/sec
> >         200 Average Throughput = 23.174 MB/sec  
> >          
> > I found this value quite less than the one present in the README distributed
> > with dbench tarball. I think the numbers in the README were for a similar 
> > machine but with 2.2.9 kernel.
> 
> If you guestimate that each dbench client uses about 20M of RAM then dbench
> 100 has no chance of remaining in memory. Once you hit disk then spinlock
> optimisations are all in the noise :) Smaller runs (< 30) should see 
> it stay in memory.
> 
> Also if you turn of kupdated (so old buffers are not flushed out just
> because they are old) and make the VM more agressive about filling
> memory with dirty buffers then you will not hit the disk and then
> hopefully the optimisations will be more obvious.
> 
> killall -STOP kupdated
> echo "90 64 64 256 500 3000 95 0 0" > /proc/sys/vm/bdflush
> 
> Remember to killall -CONT kupdated when you are finished :)
> 
> > I am copying this to Andrew also, if he can also help. Also if you have some
> > dbench numbers from 2.4.x kernel, please let me have a look into those also.
> 
> The single CPU 333MHz POWER 3 I was playing with got 100MB/s when not
> touching disk.
> 
> Anton
> -

Hi Anton,

I ran the dbench test as per your suggesstions. Now I get similar throughput 
numbers as yours. 

But still the throughput improvement is not there for my patch. the reason, I 
think, is that I didnot get too many hits to fget() routine. It will be helpful
if you can tell how you got fget() chewing up more than its fair share of CPU
time.

For 30 clients:
      Base(2.4.2)                - Average Throughput = 235.139 MB/S  
      Base + Files_struct patch  - Average Throughput = 235.751 MB/S 

I also did profiling while running these tests, using kernprof. The fget hits
are as below
      Base(2.4.2)                   304  
      Base + Files_struct patch     189

Though while doing kernprofile'ing my sample size is quite big (the top ranker 
in profiling is "default_idle" with 28471 hits). The fget's hit count is quite
low compared to "default_idle".

As you can also see, the files_struct patch is able to reduce the number of hits
to fget by around 37%

I also saw the dbench.c code. It does creates no. of child processes but with
fork() and not through __clone(). 

I think the fget() will affect the performance in the scenarios where the 
childs are created using clone() with CLONE_FILES flag set. That is when many 
child processes share parent's files_struct and everybody tries to acquire the 
same files->file_lock. And in those scenarios we should see considerable 
performance improvement by using the files_struct patch as in the case of "chat"benchmark.

Regards,
Maneesh

-- 
Maneesh Soni <smane...@in.ibm.com>
http://lse.sourceforge.net/locking/rclock.html
IBM Linux Technology Center,
IBM Software Lab, Bangalore, India
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/