Number of runs



Hi,

We need to decide how many runs any one group can submit for the
search and feature extraction tasks.

I've looked at last year's judging as a function of number of tasks & 
topics, sizes of result sets, lack of pooling, double judgments, time 
to do the judging.

This year we have larger result set sizes, but standard shots, which 
should allow pooling. We also have almost 4 weeks of assessor time
rather than 2. 

I suggest based on last year's results and the sharing of features
possible this year, that we drop the requirement that any run using 
ASR (X+ASR) be accompanied by a run using only ASR or just X. I also
suggest we do only single judgments - the inter-judge agreement was
quite high last year.

Given all that, I believe this year we can accept the following number 
of runs:

	Feature extraction 	2 runs (max)
	Search			4 runs (max)

More search runs than feature extraction runs because of the difference
in result set size (asssuming they bear on average some relation to the 
actual number of true positives) and because the availability of donated
feature sets invites the evalution of various combinations.

We think we can judge a significant part of these runs. We will then
evaluate all the submitted runs based on the truth data created by the 
judging.

Obviously we are juggling various tradeoffs - e.g., number of runs,
depth to which we judge each result set. If these plans run dramatically
counter to yours, please let me know directly.

Thanks,
Paul

-- 
Paul Over - Retrieval Group
	    Information Access Division
	    Information Technology Laboratory
	    National Institute of Standards and Technology
	    Bldg. 225  Rm. A211  (Mailstop 8940)
	    Gaithersburg, MD  20899-8940   USA
	    Voice: 301 975-6784    Fax: 301 975-5287



Date Index | Thread Index | Problems or questions? Contact list-master@nist.gov