questions about 2007 Legal Track routing & interassessor (in)consistency



As we've been discussing, interassessor consistency is bad and/or  
strange for many 2006 Legal Track ad hoc topics.   That raises  
several questions:

1.  Should we omit some of these topics from the 2007 Legal Track  
routing evaluation because we already have evidence that they are  
susceptible to high interassessor inconsistency?  This would probably  
mean the same topics should be dropped from the test collection  
entirely, since the 2006 ad hoc pool is of questionable quality (only  
6 participants, many technical problems).  If so, what should the  
test be for dropping a topic.


2.  I had envisioned that, independent of the style of routing  
evaluation (e.g. residual collection), that the union of 2006 ad hoc  
qrels and the 2007 routing qrels would be used as the qrels for these  
topics going forward.  But maybe this is unrealistic given the high  
levels of interassessor inconsistency.  So how should the final qrels  
for the collection be produced:

     2a. Go ahead and take the union?

     2b.  Distribute two alternate sets of qrels with the collection,  
one based on 2006 ad hoc and one based on 2007 routing?

     2c. Distribute only the 2007 routing qrels?

     2d. Take the union of the 2006 ad hoc relevant with the 2007  
routing relevant and nonrelevant (a kind of maximally broad  
definition of relevance).


3. If the answer to 2 is 2a, 2b, or 2d, should qrels from 2006 ad hoc  
Assessor 2 be thrown in as well?


4. If the answer to 2 is 2b or 2c, should we have the relevant from  
2006 ad hoc thrown into the 2007 routing pools to be reassessed, as a  
particularly rich source of relevant documents.


5. Do the answers to the above questions change the best strategy for  
evaluating routing.  In particular if we adopt 2c (with either answer  
to 4), is residual collection evaluation still necessary?


6. Should additional studies of interassessor consistency be built  
into the routing evaluation?  If so, what?   Should we keep the  
option open of omitting some 2006 routing topics from the final  
collection?

Dave




Date Index | Thread Index | Problems or questions? Contact list-master@nist.gov